Sustainable pavement maintenance using reinforcement learning with systematic reward design

[EN] Efficient and sustainable pavement maintenance planning remains a challenge for infrastructure managers. This study introduces a reinforcement learning methodology, based on the Q-Learning algorithm, to optimize long-term pavement maintenance policies under multiple objectives. The contribution...

ver descrição completa

Detalhes bibliográficos
Autores: Molinero-Pérez, Noelia|||0000-0001-8279-4585, Sanz-Benlloch, Amalia|||0000-0001-8051-0649, Montalbán-Domingo, Laura|||0000-0002-9506-0350, García-Segura, Tatiana|||0000-0002-7059-0566
Formato: artículo
Fecha de publicación:2026
País:España
Recursos:Universitat Politècnica de València (UPV)
Repositorio:RiuNet. Repositorio Institucional de la Universitat Politécnica de Valéncia
Idioma:inglés
OAI Identifier:oai:dnet:riunet______::f083a6c165cdf0b9adb0b97a6fef84b4
Acesso em linha:https://riunet.upv.es/handle/10251/234367
Access Level:acceso embargado
Palavra-chave:Pavement maintenance
Machine learning
Systematic reward design
Sustainability
Q-learning
09.- Desarrollar infraestructuras resilientes, promover la industrialización inclusiva y sostenible, y fomentar la innovación
11.- Conseguir que las ciudades y los asentamientos humanos sean inclusivos, seguros, resilientes y sostenibles
Descrição
Resumo:[EN] Efficient and sustainable pavement maintenance planning remains a challenge for infrastructure managers. This study introduces a reinforcement learning methodology, based on the Q-Learning algorithm, to optimize long-term pavement maintenance policies under multiple objectives. The contribution is a systematic framework for designing tailored reward functions aligned with planning goals, including economic cost, environmental impact, user savings, and maintenance effectiveness. This alignment enables the model to adapt and learn optimal strategies according to different priorities. The approach is validated through fifteen real-world case studies from the Spanish road network, incorporating traffic, structural, and climatic data. For each case, Q-Learning is trained with alternative reward formulations and evaluated over a 20-year horizon, followed by robustness analysis to assess policy stability. Results show that reward functions lead to differentiated intervention strategies, influenced by the planning objective. Cost- and emissions-oriented rewards generate reactive policies, recommending intervention only when inaction would significantly increase maintenance and rehabilitation costs. In contrast, rewards focused on user savings or technical effectiveness promote proactive continuous maintenance to preserve surface conditions and reduce indirect user costs. The proposed approach underscores the critical role of reward definition in reinforcement learning and provides a practical tool to support adaptive, sustainable, and long-term pavement management.