Machine learning XAI for early loan default prediction

Abstract: Early default prediction with predictive models is of crucial importance for financial institutions, Fintech or Peer to Peer (P2P) lending platforms, as it allows them to effectively mitigate the potential risks associated with customer or debtor defaults, anticipating before this becomes...

Descripción completa

Detalles Bibliográficos
Autores: Monje, Leticia, Carrasco González, Ramón Alberto, Sánchez-Montañes, Manuel
Tipo de recurso: artículo
Fecha de publicación:2025
País:España
Institución:Universidad Complutense de Madrid (UCM)
Repositorio:Docta Complutense
Idioma:inglés
OAI Identifier:oai:docta.ucm.es:20.500.14352/132721
Acceso en línea:https://hdl.handle.net/20.500.14352/132721
Access Level:acceso abierto
Palabra clave:004.85
519.2
336.77
Peer to peer lending
Machine learning
Early default risk
XAI
Fuzzy
Inteligencia artificial (Informática)
Estadística
1203.04 Inteligencia Artificial
1209 Estadística
1209.14 Técnicas de Predicción Estadística
Descripción
Sumario:Abstract: Early default prediction with predictive models is of crucial importance for financial institutions, Fintech or Peer to Peer (P2P) lending platforms, as it allows them to effectively mitigate the potential risks associated with customer or debtor defaults, anticipating before this becomes a major problem. This proactive approach serves to avoid the consequent impact on provisions and, subsequently, on the institution's capital. On the other hand, advanced predictive models are often less interpretable than traditional models such as probit (Abdou & Pointon, 2011) and logistic regression (Bolton, 2009; Liu et al. 2024). Due to this lower explainability, our goal was to develop a methodology that allows building an advanced predictive model together with a linguistically interpretable explanation useful for decision making from large volumes of data. For this purpose, our case study was the loan dataset of Lending Club, the largest P2P lending platform in the world. As a result, we obtained a model based on the eXtreme Gradient Boosting (XGBoost) together with its linguistic interpretation using a surrogate model and the 2-tuple fuzzy linguistic model Monje et al., (Mathematics 10:1428, 2022). This model allows us to identify five risk categories (very low, low, medium, high and very high).