Automated similarity detection: identifying duplicated requirements

Machine-Learning (ML) and Natural-Language-Processing (NLP) are two of the most known areas of Artificial Intelligence (AI). ML is a general-purpose technology which uses data to learn real-world knowledge and to improve the reliability of a specific action - typically to extract autonomous predicti...

Descripción completa

Detalles Bibliográficos
Autor: Motger de la Encarnacion, Quim
Tipo de recurso: tesis de maestría
Fecha de publicación:2019
País:España
Institución:Universitat Oberta de Catalunya (UOC)
Repositorio:O2, repositorio institucional de la UOC
OAI Identifier:oai:openaccess.uoc.edu:10609/105807
Acceso en línea:http://hdl.handle.net/10609/105807
Access Level:acceso abierto
Palabra clave:requirements engineering
similarity detection
duplicated requirements
ingeniería de requisitos
detección de similitudes
requisitos duplicados
enginyeria de requisits
detecció de similituds
requisits duplicats
Artificial intelligence -- TFM
Intel·ligència artificial -- TFM
Inteligencia artificial -- TFM
Descripción
Sumario:Machine-Learning (ML) and Natural-Language-Processing (NLP) are two of the most known areas of Artificial Intelligence (AI). ML is a general-purpose technology which uses data to learn real-world knowledge and to improve the reliability of a specific action - typically to extract autonomous predictions about partial data observations. On the other hand, NLP applies to the task of developing representations of features of natural language based on its textual information. One area of application of NLP and ML is the Requirements Engineering (RE) field. RE is the set of processes of Software Engineering (SE) focused on the management of a set of requirements that describes a system. Between the challenges of RE, it is highlighted the detection of duplicated requirements. If ignored, these duplicities may lead to redundancy in the textual information of a project and therefore this may lead to the duplicity of tasks. Moreover, the automation of this process and the standardized usage of specific, accurate tools are still at a state-of-the-art stage. This master thesis is a state-of-the-art analysis to apply automated requirements similarity detection, using AI techniques, for the detection of duplicates between project requirements. Based on a literature review, this thesis must be a practical evaluation and a development proposal of duplicate detection in SE project requirements. This work is developed within the OpenReq project, an EU-Horizon-2020 project whose goal is "to build an intelligent decision system for community-driven RE". This collaboration allows the usage of real requirements data to evaluate the algorithms developed in this project.