Sustainable pavement maintenance using reinforcement learning with systematic reward design

Molinero-Pérez, Noelia|||0000-0001-8279-4585; Sanz-Benlloch, Amalia|||0000-0001-8051-0649; Montalbán-Domingo, Laura|||0000-0002-9506-0350; García-Segura, Tatiana|||0000-0002-7059-0566

Sustainable pavement maintenance using reinforcement learning with systematic reward design

[EN] Efficient and sustainable pavement maintenance planning remains a challenge for infrastructure managers. This study introduces a reinforcement learning methodology, based on the Q-Learning algorithm, to optimize long-term pavement maintenance policies under multiple objectives. The contribution...

Descripción completa

Detalles Bibliográficos
Autores:	Molinero-Pérez, Noelia\|\|\|0000-0001-8279-4585, Sanz-Benlloch, Amalia\|\|\|0000-0001-8051-0649, Montalbán-Domingo, Laura\|\|\|0000-0002-9506-0350, García-Segura, Tatiana\|\|\|0000-0002-7059-0566
Tipo de recurso:	artículo
Fecha de publicación:	2026
País:	España
Institución:	Universitat Politècnica de València (UPV)
Repositorio:	RiuNet. Repositorio Institucional de la Universitat Politécnica de Valéncia
Idioma:	inglés
OAI Identifier:	oai:dnet:riunet______::f083a6c165cdf0b9adb0b97a6fef84b4
Acceso en línea:	https://riunet.upv.es/handle/10251/234367
Access Level:	acceso embargado
Palabra clave:	Pavement maintenance Machine learning Systematic reward design Sustainability Q-learning 09.- Desarrollar infraestructuras resilientes, promover la industrialización inclusiva y sostenible, y fomentar la innovación 11.- Conseguir que las ciudades y los asentamientos humanos sean inclusivos, seguros, resilientes y sostenibles

Descripción
Sumario:	[EN] Efficient and sustainable pavement maintenance planning remains a challenge for infrastructure managers. This study introduces a reinforcement learning methodology, based on the Q-Learning algorithm, to optimize long-term pavement maintenance policies under multiple objectives. The contribution is a systematic framework for designing tailored reward functions aligned with planning goals, including economic cost, environmental impact, user savings, and maintenance effectiveness. This alignment enables the model to adapt and learn optimal strategies according to different priorities. The approach is validated through fifteen real-world case studies from the Spanish road network, incorporating traffic, structural, and climatic data. For each case, Q-Learning is trained with alternative reward formulations and evaluated over a 20-year horizon, followed by robustness analysis to assess policy stability. Results show that reward functions lead to differentiated intervention strategies, influenced by the planning objective. Cost- and emissions-oriented rewards generate reactive policies, recommending intervention only when inaction would significantly increase maintenance and rehabilitation costs. In contrast, rewards focused on user savings or technical effectiveness promote proactive continuous maintenance to preserve surface conditions and reduce indirect user costs. The proposed approach underscores the critical role of reward definition in reinforcement learning and provides a practical tool to support adaptive, sustainable, and long-term pavement management.

Sustainable pavement maintenance using reinforcement learning with systematic reward design

Similares en LA Referencia