Labour Statistics vs. Static Word Embeddings: a Comparison of Gender Bias

This project explores the relation between labour statistics information and three language models: GloVe, word2vec and fastText, in both English and Spanish. The aim is to see what differs in reality versus word embedding spaces in terms of gender bias. To do so, diverse linguistic data sets were c...

Descripción completa

Detalles Bibliográficos
Autor: Figueroa Vásquez, Andrés
Tipo de recurso: tesis de maestría
Fecha de publicación:2021
País:España
Institución:Universidad del País Vasco
Repositorio:Addi. Archivo Digital para la Docencia y la Investigación
OAI Identifier:oai:addi.ehu.eus:10810/61830
Acceso en línea:http://hdl.handle.net/10810/61830
Access Level:acceso abierto
Palabra clave:gender bias
static word embeddings
ethics
artificial intelligence
Descripción
Sumario:This project explores the relation between labour statistics information and three language models: GloVe, word2vec and fastText, in both English and Spanish. The aim is to see what differs in reality versus word embedding spaces in terms of gender bias. To do so, diverse linguistic data sets were created, using what previous authors called extreme she occupations and extreme he occupations. To better assess their behaviour, these outcomes were compared to gender-neutral professions. This way, the variation of utilising different static word embeddings, corpora and natural languages will be determined, as to discover the patterns that lie underneath them.