Using word embeddings for immigrant and refugee stereotype quantification in a diachronic and multilingual setting

Word embeddings are efficient machine-learning-based representations of human language used in many Natural Language Processing tasks nowadays. Due to their ability to learn underlying word association patterns present in large volumes of data, it is possible to observe various sociolinguistic pheno...

Descripción completa

Detalles Bibliográficos
Autores: Sorato, Danielly, Lundsteen, Martin, Colominas Ventura, Carme, Zavala-Rojas, Diana
Tipo de recurso: artículo
Estado:Versión publicada
Fecha de publicación:2024
País:España
Institución:Universitat Pompeu Fabra
Repositorio:Repositorio Digital de la UPF
OAI Identifier:oai:repositori.upf.edu:10230/59542
Acceso en línea:http://hdl.handle.net/10230/59542
http://dx.doi.org/10.1007/s42001-023-00243-6
Access Level:acceso abierto
Palabra clave:Word embeddings
Computational sociolinguistics
Social bias
Stereotypes
Diachronic analysis
Multilingual analysis
Descripción
Sumario:Word embeddings are efficient machine-learning-based representations of human language used in many Natural Language Processing tasks nowadays. Due to their ability to learn underlying word association patterns present in large volumes of data, it is possible to observe various sociolinguistic phenomena in the embedding semantic space, such as social stereotypes. The use of stereotypical framing in discourse can be detrimental and induce misconceptions about certain groups, such as immigrants and refugees, especially when used by media and politicians in public discourse. In this paper, we use word embeddings to investigate immigrant and refugee stereotypes in a multilingual and diachronic setting. We analyze the Danish, Dutch, English, and Spanish portions of four different multilingual corpora of political discourse, covering the 1997–2018 period. Then, we measure the effect of sociopolitical variables such as the number of offences committed and the size of the refugee and immigrant groups in the host country over our measurements of stereotypical association using the Bayesian multilevel framework. Our results indicate the presence of stereotypical associations towards both immigrants and refugees for all 4 languages, and that the immigrants are overall more strongly associated with the stereotypical frames than refugees.