The temporal flow of relevant terms: an analysis in UFMG theses from 2007 to 2018 in human sciences

This research's general objective was to analyze if there is a temporal variation characteristic of the distribution of values of relevant terms over the time of the production of texts that can contribute as a criterion for the automatic indexing process. The doctoral theses of the graduate pr...

Descripción completa

Detalles Bibliográficos
Autores: Mesquita, Luiz Antonio lopes, Dias, Célia da Consolação, Souza, Renato Rocha
Tipo de recurso: artículo
Estado:Versión publicada
Fecha de publicación:2021
País:Brasil
Institución:Universidade Federal de Minas Gerais (UFMG)
Repositorio:Múltiplos Olhares em Ciência da Informação
Idioma:portugués
OAI Identifier:oai:periodicos.ufmg.br:article/37241
Acceso en línea:https://periodicos.ufmg.br/index.php/moci/article/view/37241
Access Level:acceso abierto
Palabra clave:Indexação Automática
Sintagmas Nominais
Recuperação da Informação Temporal
Temporal Information Retrieval
Automatic Indexing
Noun Phrase
Descripción
Sumario:This research's general objective was to analyze if there is a temporal variation characteristic of the distribution of values of relevant terms over the time of the production of texts that can contribute as a criterion for the automatic indexing process. The doctoral theses of the graduate programs (PPGs) in Human Sciences at UFMG were analyzed, considering seven different PPGs, each of which is a corpus, with 929 theses defended in a period of twelve years, from 2007 to 2018. The terms considered were all the noun phrases contained in the texts of the theses. Each noun phrase received a value associated with its relevance as a descriptor according to the term frequency criteria in the thesis itself (TF - Term Frequency) and with the inverse of the frequency of occurrence of the term in the total of theses of each PPG (IDF - Inverse Document Frequency). The theses were divided into 12 groups in each PPG to calculate the average defense date of the theses and the average consolidated score of the relevant terms in the theses. As a result, each PPG's characteristic behavior was identified through a scatter plot of the average level of relevance score over time. For each graph of each of the 7 PPGs, a trend line was added, considering its respective R², and its specific analysis was made. All temporal distribution behaviors were characterized in polynomial equations and applied as a criterion for automatic indexing.