Sustainable pavement maintenance using reinforcement learning with systematic reward design
[EN] Efficient and sustainable pavement maintenance planning remains a challenge for infrastructure managers. This study introduces a reinforcement learning methodology, based on the Q-Learning algorithm, to optimize long-term pavement maintenance policies under multiple objectives. The contribution...
| Autores: | , , , |
|---|---|
| Tipo de recurso: | artículo |
| Fecha de publicación: | 2026 |
| País: | España |
| Institución: | Universitat Politècnica de València (UPV) |
| Repositorio: | RiuNet. Repositorio Institucional de la Universitat Politécnica de Valéncia |
| Idioma: | inglés |
| OAI Identifier: | oai:dnet:riunet______::f083a6c165cdf0b9adb0b97a6fef84b4 |
| Acceso en línea: | https://riunet.upv.es/handle/10251/234367 |
| Access Level: | acceso embargado |
| Palabra clave: | Pavement maintenance Machine learning Systematic reward design Sustainability Q-learning 09.- Desarrollar infraestructuras resilientes, promover la industrialización inclusiva y sostenible, y fomentar la innovación 11.- Conseguir que las ciudades y los asentamientos humanos sean inclusivos, seguros, resilientes y sostenibles |
| Sumario: | [EN] Efficient and sustainable pavement maintenance planning remains a challenge for infrastructure managers. This study introduces a reinforcement learning methodology, based on the Q-Learning algorithm, to optimize long-term pavement maintenance policies under multiple objectives. The contribution is a systematic framework for designing tailored reward functions aligned with planning goals, including economic cost, environmental impact, user savings, and maintenance effectiveness. This alignment enables the model to adapt and learn optimal strategies according to different priorities. The approach is validated through fifteen real-world case studies from the Spanish road network, incorporating traffic, structural, and climatic data. For each case, Q-Learning is trained with alternative reward formulations and evaluated over a 20-year horizon, followed by robustness analysis to assess policy stability. Results show that reward functions lead to differentiated intervention strategies, influenced by the planning objective. Cost- and emissions-oriented rewards generate reactive policies, recommending intervention only when inaction would significantly increase maintenance and rehabilitation costs. In contrast, rewards focused on user savings or technical effectiveness promote proactive continuous maintenance to preserve surface conditions and reduce indirect user costs. The proposed approach underscores the critical role of reward definition in reinforcement learning and provides a practical tool to support adaptive, sustainable, and long-term pavement management. |
|---|