06 Evaluation of state-of-art phishing detection strategies based on machine learning
Pphishing is one of the most common cyber-attacks.Machine Learning approaches can effectively deal with Phishing detection. However, models are trained on datasets with landing pages as legitimate samples without login forms, which is a situation closer to the real-world problem. In this work, we pr...
| Autores: | , , , , , , |
|---|---|
| Tipo de recurso: | capítulo de libro |
| Fecha de publicación: | 2021 |
| País: | España |
| Institución: | Universidad de Castilla-La Mancha |
| Repositorio: | RUIdeRA. Repositorio Institucional de la UCLM |
| OAI Identifier: | oai:ruidera.uclm.es:10578/28607 |
| Acceso en línea: | http://doi.org/10.18239/jornadas_2021.34.06 http://hdl.handle.net/10578/28607 |
| Access Level: | acceso abierto |
| Palabra clave: | cybersegurity Phishing Detection URL Artificial Intelligence Machine Learning NLP |
| Sumario: | Pphishing is one of the most common cyber-attacks.Machine Learning approaches can effectively deal with Phishing detection. However, models are trained on datasets with landing pages as legitimate samples without login forms, which is a situation closer to the real-world problem. In this work, we presented the Phishing Index Login URL (PILU-60K), a dataset with URLs of both index pages and login pages. Besides, five of the most used Machine Learning models were implemented and tested on PILU-60K and compared with well-known datasets.We used the models trained on index pages and tested on login pages to determine if the performance was affected when the models have to classify login URLs. Also, we reviewed the performance of the models over time, trained with datasets from 2016 and 2017, and tested them on recent ones. Results showed that models lose up to 14.5% of accuracy compared to the reported performance. |
|---|