Exploring reward strategies for wind turbine pitch control by reinforcement learning

In this work, a pitch controller of a wind turbine (WT) inspired by reinforcement learning (RL) is designed and implemented. The control system consists of a state estimator, a reward strategy, a policy table, and a policy update algorithm. Novel reward strategies related to the energy deviation fro...

Descripción completa

Detalles Bibliográficos
Autores: Sierra-García, Jesús Enrique, Santos Peñas, Matilde
Tipo de recurso: artículo
Fecha de publicación:2020
País:España
Institución:Universidad Complutense de Madrid (UCM)
Repositorio:Docta Complutense
Idioma:inglés
OAI Identifier:oai:docta.ucm.es:20.500.14352/112247
Acceso en línea:https://hdl.handle.net/20.500.14352/112247
Access Level:acceso abierto
Palabra clave:Intelligent control
Pitch control
Wind turbines
Wind energy
Reinforcement learning
Reward strategies
Inteligencia artificial (Informática)
1203.04 Inteligencia Artificial
Descripción
Sumario:In this work, a pitch controller of a wind turbine (WT) inspired by reinforcement learning (RL) is designed and implemented. The control system consists of a state estimator, a reward strategy, a policy table, and a policy update algorithm. Novel reward strategies related to the energy deviation from the rated power are defined. They are designed to improve the efficiency of the WT. Two new categories of reward strategies are proposed: “only positive” (O-P) and “positive-negative” (P-N) rewards. The relationship of these categories with the exploration-exploitation dilemma, the use of ϵ-greedy methods and the learning convergence are also introduced and linked to the WT control problem. In addition, an extensive analysis of the influence of the different rewards in the controller performance and in the learning speed is carried out. The controller is compared with a proportional-integral-derivative (PID) regulator for the same small wind turbine, obtaining better results. The simulations show how the P-N rewards improve the performance of the controller, stabilize the output power around the rated power, and reduce the error over time.