06 Evaluation of state-of-art phishing detection strategies based on machine learning

Pphishing is one of the most common cyber-attacks.Machine Learning approaches can effectively deal with Phishing detection. However, models are trained on datasets with landing pages as legitimate samples without login forms, which is a situation closer to the real-world problem. In this work, we pr...

Descripción completa

Detalles Bibliográficos
Autores: Castaño, Felipe, Sánchez Paniagua, Manuel, Delgado Sotes, Juan José, Velasco Mata, Javier, Sepúlveda, Antonio, Fidalgo, Eduardo, Alegre, Enrique
Tipo de recurso: capítulo de libro
Fecha de publicación:2021
País:España
Institución:Universidad de Castilla-La Mancha
Repositorio:RUIdeRA. Repositorio Institucional de la UCLM
OAI Identifier:oai:ruidera.uclm.es:10578/28607
Acceso en línea:http://doi.org/10.18239/jornadas_2021.34.06
http://hdl.handle.net/10578/28607
Access Level:acceso abierto
Palabra clave:cybersegurity
Phishing Detection
URL
Artificial Intelligence
Machine Learning
NLP
Descripción
Sumario:Pphishing is one of the most common cyber-attacks.Machine Learning approaches can effectively deal with Phishing detection. However, models are trained on datasets with landing pages as legitimate samples without login forms, which is a situation closer to the real-world problem. In this work, we presented the Phishing Index Login URL (PILU-60K), a dataset with URLs of both index pages and login pages. Besides, five of the most used Machine Learning models were implemented and tested on PILU-60K and compared with well-known datasets.We used the models trained on index pages and tested on login pages to determine if the performance was affected when the models have to classify login URLs. Also, we reviewed the performance of the models over time, trained with datasets from 2016 and 2017, and tested them on recent ones. Results showed that models lose up to 14.5% of accuracy compared to the reported performance.