Using Shallow and Deep Learning to Automatically Detect Hate Motivated by Gender and Sexual Orientation on Twitter in Spanish

[EN] The increasing phenomenon of “cyberhate” is concerning because of the potential social implications of this form of verbal violence, which is aimed at already-stigmatized social groups. According to information collected by the Ministry of the Interior of Spain, the category of sexual orientati...

ver descrição completa

Detalhes bibliográficos
Autores: Arcila Calderón, Carlos, Jiménez Amores, Francisco Javier, Sánchez Holgado, Patricia, Blanco Herrero, David
Formato: artículo
Estado:Versión publicada
Fecha de publicación:2021
País:España
Recursos:Universidad de Salamanca (USAL)
Repositorio:GREDOS. Repositorio Institucional de la Universidad de Salamanca
OAI Identifier:oai:gredos.usal.es:10366/160896
Acesso em linha:http://hdl.handle.net/10366/160896
Access Level:acceso abierto
Palavra-chave:Supervised classification
Deep learning
Machine learning
Misogyny
Feminism
Sexual orientation
Gender identity
Gender discrimination
Hate speech
Twitter
63 Sociología
6308 Comunicaciones Sociales
Descrição
Resumo:[EN] The increasing phenomenon of “cyberhate” is concerning because of the potential social implications of this form of verbal violence, which is aimed at already-stigmatized social groups. According to information collected by the Ministry of the Interior of Spain, the category of sexual orientation and gender identity is subject to the third-highest number of registered hate crimes, ranking behind racism/xenophobia and ideology. However, most of the existing computational approaches to online hate detection simultaneously attempt to address all types of discrimination, leading to weaker prototype performances. These approaches focus on other reasons for hate—primarily racism and xenophobia—and usually focus on English messages. Furthermore, few detection models have used manually generated databases as a training corpus. Using supervised machine learning techniques, the present research sought to overcome these limitations by developing and evaluating an automatic detector of hate speech motivated by gender and sexual orientation. The focus was Spanish-language posts on Twitter. For this purpose, eight predictive models were developed from an ad hoc generated training corpus, using shallow modeling and deep learning. The evaluation metrics showed that the deep learning algorithm performed significantly better than the shallow modeling algorithms, and logistic regression yielded the best performance of the shallow algorithms.