Predicció de l'ús del català mitjançant la classificació supervisada

Grimaldo Moreno, Francisco; López Iñesta, Emilia; Perucho Pla, Manel; Querol Puig, Ernest

Predicció de l'ús del català mitjançant la classificació supervisada

One of the main challenges that the sociology of language has faced is the determination of the variables that govern the use of a language. Inspired by the field of artificial intelligence, in this study we make use of machine learning as a suitable approach to implement computational methods that...

Descripción completa

Detalles Bibliográficos
Autores:	Grimaldo Moreno, Francisco, López Iñesta, Emilia, Perucho Pla, Manel, Querol Puig, Ernest
Tipo de recurso:	artículo
Fecha de publicación:	2016
País:	España
Institución:	Universitat Oberta de Catalunya (UOC)
Repositorio:	O2, repositorio institucional de la UOC
OAI Identifier:	oai:openaccess.uoc.edu:10609/70667
Acceso en línea:	http://hdl.handle.net/10609/70667
Access Level:	acceso abierto
Palabra clave:	ús lingüístic predicció intel·ligència artificial aprenentatge automàtic classificació supervisada uso lingüístico predicción inteligencia artificial aprendizaje automático clasificación supervisada linguistic use prediction artificial intelligence machine learning supervised classification Catalan language -- Usage Català -- Ús Catalán -- Uso

Descripción
Sumario:	One of the main challenges that the sociology of language has faced is the determination of the variables that govern the use of a language. Inspired by the field of artificial intelligence, in this study we make use of machine learning as a suitable approach to implement computational methods that permit the induction of linguistic use models derived from the available data. We aim to improve the level of prediction for the degree of use of the Catalan language achieved up to now. To this end, we have used three supervised classification techniques: Naive Bayes, decision trees, and support vector machines. We needed an empirical corpus that would allow us to test the prediction level of a theoretical model as well as its validity within different sociolinguistic situations. To the best of our knowledge, the work by Querol is the one providing the highest prediction success in all the Catalan-speaking territories. Thus, the research presented in this paper uses that data to conclude that supervised classification can be used to successfully determine prediction models for the degree of use of Catalan that outperform previous attempts and that allow us to identify the most relevant variables of the problem. Moreover, it also helps us to solve the methodological problem of the division of linguistic groups and shows that the use of a language is a continuous system rather than a discrete one.

Predicció de l'ús del català mitjançant la classificació supervisada

Similares en LA Referencia