Group theory and symmetries for Machine Learning applying to robotics
Recently, Deep Reinforcement Learning (DRL) has found numerous applications in robotics control enabling robots to learn complex tasks and adapt to new environments, however, it is notorious for being extremely sensitive to hyperparameters and also suffers from the problem of low sample efficiency....
| Autor: | |
|---|---|
| Tipo de recurso: | tesis de maestría |
| Fecha de publicación: | 2023 |
| País: | España |
| Institución: | Universitat Politècnica de Catalunya (UPC) |
| Repositorio: | UPCommons. Portal del coneixement obert de la UPC |
| Idioma: | inglés |
| OAI Identifier: | oai:upcommons.upc.edu:2117/405721 |
| Acceso en línea: | https://hdl.handle.net/2117/405721 |
| Access Level: | acceso abierto |
| Palabra clave: | Machine learning Reinforcement learning Robotics morphological symmetry symmetric Markov Decision Process equivariant neural networks symmetric data augmentation online Reinforcement Learning offline Reinforcement Learning Aprenentatge automàtic Aprenentatge per reforç Robòtica Àrees temàtiques de la UPC::Informàtica::Intel·ligència artificial::Aprenentatge automàtic |
| Sumario: | Recently, Deep Reinforcement Learning (DRL) has found numerous applications in robotics control enabling robots to learn complex tasks and adapt to new environments, however, it is notorious for being extremely sensitive to hyperparameters and also suffers from the problem of low sample efficiency. Meanwhile, striking the right balance between exploring the environment and exploiting known strategies (known as the exploration-exploitation dilemma) poses a significant challenge for the application of DRL. In the field of robotics, symmetry plays a significant role in designing and controlling robotic systems. By holding the equivariant and invariant properties, the morphological symmetries in robots show great potential to alleviate the dilemma of exploration vs. exploitation dramatically and improve the sample efficiency significantly in the DRL learning process. Meanwhile, it can also greatly enhance the model's generalization ability since the symmetric architectures can allow robots to better generalize their learned policies to new, unseen scenarios. In order to exploit the above benefits, we extend this symmetric inductive bias into the Reinforcement Learning (RL) scheme by defining a concept of symmetric Markov Decision Process (MDP) in robot systems. In our project, we incorporate symmetric MDP in DRL pipeline in 2 ways: (1) imposing the symmetric constraints in the design of neural layers the design of neural layers to construct equivariant/invariant networks or (2) performing data augmentation based on the symmetric transformations to enlarge the streaming replay buffer or the fixed offline dataset during training. In our work, we evaluated the effectiveness of the two aforementioned 2 methods under both online and RL settings. We found that both of them can significantly enhance the performance of DRL models in locomotion control tasks. Under certain conditions, such as with a small training set in offline RL learning, combining the two can help the agent learn a superior control strategy. By incorporating the equivariant neural network into the Behavior Cloning algorithm, we got extremely effective policies for Cartpole control task, the policy converged to the optimal solution after the first epoch and remained behaving steadily in the subsequent training epochs. In TriFinger task, We found that when training with 10% to 30% of the original dataset with symmetric data augmentation, the results of the Implicit Q-Learning (IQL) algorithm can be improved largely. By using 30% of the data for data augmentation, we achieved the highest return value of 1353. For online RL, in the training results based on the Soft Actor-Critic algorithm in IsaacGym Cartpole environment, we got 40 more scores continuously in return at the last 100 training epochs with the symmetric data augmentation approach. |
|---|