Analysis of extraction of descriptors as noun phrases through the OGMA software

Corrêa, Renato Fernandes; Bazílio, Luiz Henrique Teixeira

Analysis of extraction of descriptors as noun phrases through the OGMA software

This work investigates automatic indexing by noun phrases of documents containing title and abstract of 30 theses and dissertations written in Portuguese and of three different areas of knowledge. The research method is exploratory and based on literature review and an experiment. The experiment con...

Descripción completa

Detalles Bibliográficos
Autores:	Corrêa, Renato Fernandes, Bazílio, Luiz Henrique Teixeira
Tipo de recurso:	artículo
Estado:	Versión publicada
Fecha de publicación:	2017
País:	Brasil
Institución:	Universidade Federal de Santa Catarina (UFSC)
Repositorio:	Encontros Bibli
Idioma:	portugués
OAI Identifier:	oai:periodicos.ufsc.br:article/46434
Acceso en línea:	https://periodicos.ufsc.br/index.php/eb/article/view/1518-2924.2017v22n50p44
Access Level:	acceso abierto
Palabra clave:	Indexação automática Sintagmas Nominais Palavras-chaves Teses e dissertações software OGMA Automatic indexing Noun Phrases Keywords Theses and dissertations OGMA software

Descripción
Sumario:	This work investigates automatic indexing by noun phrases of documents containing title and abstract of 30 theses and dissertations written in Portuguese and of three different areas of knowledge. The research method is exploratory and based on literature review and an experiment. The experiment consisted of the OGMA software output analysis for the document corpus and the measurement of the level of recall of keywords present in the documents. It shows a descriptive profile of the sequences of grammatical labels for keywords present extracted and not extracted as noun phrases. It is concluded that 68% of the totality of keywords informed by the authors were in the title or abstract of the thesis or dissertations, of these 66% were extracted as noun phrases, which corresponds to the recall level of keywords present reached by OGMA software. Keywords present and not extracted had mainly nouns or adjectives labeled with incorrect grammatical category by the software. Keywords present and extracted were mostly single nouns (30%), noun-adjective pair (28%) and noun-preposition-noun trigram (19%). The OGMA obtained a good level of recall of keywords present, and this level can increases in almost 34% with adjustments in the part-of-speech tagger.

Analysis of extraction of descriptors as noun phrases through the OGMA software

Similares en LA Referencia