Labour Statistics vs. Static Word Embeddings: a Comparison of Gender Bias
This project explores the relation between labour statistics information and three language models: GloVe, word2vec and fastText, in both English and Spanish. The aim is to see what differs in reality versus word embedding spaces in terms of gender bias. To do so, diverse linguistic data sets were c...
| Autor: | |
|---|---|
| Tipo de recurso: | tesis de maestría |
| Fecha de publicación: | 2021 |
| País: | España |
| Institución: | Universidad del País Vasco |
| Repositorio: | Addi. Archivo Digital para la Docencia y la Investigación |
| OAI Identifier: | oai:addi.ehu.eus:10810/61830 |
| Acceso en línea: | http://hdl.handle.net/10810/61830 |
| Access Level: | acceso abierto |
| Palabra clave: | gender bias static word embeddings ethics artificial intelligence |
| Sumario: | This project explores the relation between labour statistics information and three language models: GloVe, word2vec and fastText, in both English and Spanish. The aim is to see what differs in reality versus word embedding spaces in terms of gender bias. To do so, diverse linguistic data sets were created, using what previous authors called extreme she occupations and extreme he occupations. To better assess their behaviour, these outcomes were compared to gender-neutral professions. This way, the variation of utilising different static word embeddings, corpora and natural languages will be determined, as to discover the patterns that lie underneath them. |
|---|