Exploiting morphological symmetries in offline reinforcement learning
Reinforcement learning has enabled robotic agents to learn complex tasks, from locomotion to manipulation. While this usually requires interaction with the environment, such interaction can be costly or impractical. In these cases, offline reinforcement learning (ORL) allows agents to learn from pre...
| Autor: | |
|---|---|
| Tipo de recurso: | tesis de maestría |
| Fecha de publicación: | 2025 |
| País: | España |
| Institución: | Universitat Politècnica de Catalunya (UPC) |
| Repositorio: | UPCommons. Portal del coneixement obert de la UPC |
| Idioma: | inglés |
| OAI Identifier: | oai:upcommons.upc.edu:2117/445043 |
| Acceso en línea: | https://hdl.handle.net/2117/445043 |
| Access Level: | acceso abierto |
| Palabra clave: | Group theory Reinforcement learning Aprenentatge per reforçament Robòtica Teoria de grups Augment de dades Xarxa neuronal equivariant Aprenentatge per reforçament fora de línia Simetries de MDP Data augmentation Equivariant neural network Offline reinforcement learning Symmetry Morphological symmetries MDP symmetries Grups, Teoria de Aprenentatge per reforç Àrees temàtiques de la UPC::Informàtica::Intel·ligència artificial::Aprenentatge automàtic |
| id |
ES_398fc57338231be5e6a29dcbb40d94db |
|---|---|
| oai_identifier_str |
oai:upcommons.upc.edu:2117/445043 |
| network_acronym_str |
ES |
| network_name_str |
España |
| repository_id_str |
|
| spelling |
Exploiting morphological symmetries in offline reinforcement learningLopez Closa, JúliaGroup theoryReinforcement learningAprenentatge per reforçamentRobòticaTeoria de grupsAugment de dadesXarxa neuronal equivariantAprenentatge per reforçament fora de líniaSimetries de MDPReinforcement learningGroup theoryData augmentationEquivariant neural networkOffline reinforcement learningSymmetryMorphological symmetriesMDP symmetriesMDP symmetriesMDP symmetriesGrups, Teoria deAprenentatge per reforçÀrees temàtiques de la UPC::Informàtica::Intel·ligència artificial::Aprenentatge automàticReinforcement learning has enabled robotic agents to learn complex tasks, from locomotion to manipulation. While this usually requires interaction with the environment, such interaction can be costly or impractical. In these cases, offline reinforcement learning (ORL) allows agents to learn from pre-collected data instead. However, this paradigm introduces challenges such as extrapolation error and the inability to explore beyond the dataset, making sufficient and diverse data essential. Many robots also exhibit structural regularities that preserve system dynamics under transformations. We refer to these as morphological symmetries, which can be formalized with group theory, applied with representation theory, and interpreted as symmetries of the underlying MDP. In this thesis, we explore how exploiting morphological symmetries can improve data efficiency, motion consistency, and generalization in ORL. Specifically, we investigate two complementary approaches: (1) data augmentation via symmetry transformations and (2) equivariant neural architectures based on invariant and equivariant MLPs. We evaluate their performance across multiple robotic environments and datasets of varying quality, and propose an extension of TD3+BC, RAISymE(TD3+BC), that mitigates mean-seeking behavior arising from dataset multimodality introduced through symmetry-based augmentation. Our results show that, when the behavior policy induces an overlapping support across symmetric regions of the state space, exploiting morphological symmetries leads to consistent performance gains in data-scarce scenarios and promotes more symmetric policies.Universitat Politècnica de CatalunyaMartín Muñoz, Mario20252025-10-2220252025-10-30master thesishttp://purl.org/coar/resource_type/c_bdccNAhttp://purl.org/coar/version/c_be7fb7dd8ff6fe43info:eu-repo/semantics/masterThesisapplication/pdfhttps://hdl.handle.net/2117/445043reponame:UPCommons. Portal del coneixement obert de la UPCinstname:Universitat Politècnica de Catalunya (UPC)Inglésengopen accesshttp://purl.org/coar/access_right/c_abf2info:eu-repo/semantics/openAccessoai:upcommons.upc.edu:2117/4450432026-05-27T15:37:01Z |
| dc.title.none.fl_str_mv |
Exploiting morphological symmetries in offline reinforcement learning |
| title |
Exploiting morphological symmetries in offline reinforcement learning |
| spellingShingle |
Exploiting morphological symmetries in offline reinforcement learning Lopez Closa, Júlia Group theory Reinforcement learning Aprenentatge per reforçament Robòtica Teoria de grups Augment de dades Xarxa neuronal equivariant Aprenentatge per reforçament fora de línia Simetries de MDP Reinforcement learning Group theory Data augmentation Equivariant neural network Offline reinforcement learning Symmetry Morphological symmetries MDP symmetries MDP symmetries MDP symmetries Grups, Teoria de Aprenentatge per reforç Àrees temàtiques de la UPC::Informàtica::Intel·ligència artificial::Aprenentatge automàtic |
| title_short |
Exploiting morphological symmetries in offline reinforcement learning |
| title_full |
Exploiting morphological symmetries in offline reinforcement learning |
| title_fullStr |
Exploiting morphological symmetries in offline reinforcement learning |
| title_full_unstemmed |
Exploiting morphological symmetries in offline reinforcement learning |
| title_sort |
Exploiting morphological symmetries in offline reinforcement learning |
| dc.creator.none.fl_str_mv |
Lopez Closa, Júlia |
| author |
Lopez Closa, Júlia |
| author_facet |
Lopez Closa, Júlia |
| author_role |
author |
| dc.contributor.none.fl_str_mv |
Martín Muñoz, Mario |
| dc.subject.none.fl_str_mv |
Group theory Reinforcement learning Aprenentatge per reforçament Robòtica Teoria de grups Augment de dades Xarxa neuronal equivariant Aprenentatge per reforçament fora de línia Simetries de MDP Reinforcement learning Group theory Data augmentation Equivariant neural network Offline reinforcement learning Symmetry Morphological symmetries MDP symmetries MDP symmetries MDP symmetries Grups, Teoria de Aprenentatge per reforç Àrees temàtiques de la UPC::Informàtica::Intel·ligència artificial::Aprenentatge automàtic |
| topic |
Group theory Reinforcement learning Aprenentatge per reforçament Robòtica Teoria de grups Augment de dades Xarxa neuronal equivariant Aprenentatge per reforçament fora de línia Simetries de MDP Reinforcement learning Group theory Data augmentation Equivariant neural network Offline reinforcement learning Symmetry Morphological symmetries MDP symmetries MDP symmetries MDP symmetries Grups, Teoria de Aprenentatge per reforç Àrees temàtiques de la UPC::Informàtica::Intel·ligència artificial::Aprenentatge automàtic |
| description |
Reinforcement learning has enabled robotic agents to learn complex tasks, from locomotion to manipulation. While this usually requires interaction with the environment, such interaction can be costly or impractical. In these cases, offline reinforcement learning (ORL) allows agents to learn from pre-collected data instead. However, this paradigm introduces challenges such as extrapolation error and the inability to explore beyond the dataset, making sufficient and diverse data essential. Many robots also exhibit structural regularities that preserve system dynamics under transformations. We refer to these as morphological symmetries, which can be formalized with group theory, applied with representation theory, and interpreted as symmetries of the underlying MDP. In this thesis, we explore how exploiting morphological symmetries can improve data efficiency, motion consistency, and generalization in ORL. Specifically, we investigate two complementary approaches: (1) data augmentation via symmetry transformations and (2) equivariant neural architectures based on invariant and equivariant MLPs. We evaluate their performance across multiple robotic environments and datasets of varying quality, and propose an extension of TD3+BC, RAISymE(TD3+BC), that mitigates mean-seeking behavior arising from dataset multimodality introduced through symmetry-based augmentation. Our results show that, when the behavior policy induces an overlapping support across symmetric regions of the state space, exploiting morphological symmetries leads to consistent performance gains in data-scarce scenarios and promotes more symmetric policies. |
| publishDate |
2025 |
| dc.date.none.fl_str_mv |
2025 2025-10-22 2025 2025-10-30 |
| dc.type.none.fl_str_mv |
master thesis http://purl.org/coar/resource_type/c_bdcc NA http://purl.org/coar/version/c_be7fb7dd8ff6fe43 |
| dc.type.openaire.fl_str_mv |
info:eu-repo/semantics/masterThesis |
| format |
masterThesis |
| dc.identifier.none.fl_str_mv |
https://hdl.handle.net/2117/445043 |
| url |
https://hdl.handle.net/2117/445043 |
| dc.language.none.fl_str_mv |
Inglés eng |
| language_invalid_str_mv |
Inglés |
| language |
eng |
| dc.rights.none.fl_str_mv |
open access http://purl.org/coar/access_right/c_abf2 |
| dc.rights.openaire.fl_str_mv |
info:eu-repo/semantics/openAccess |
| rights_invalid_str_mv |
open access http://purl.org/coar/access_right/c_abf2 |
| eu_rights_str_mv |
openAccess |
| dc.format.none.fl_str_mv |
application/pdf |
| dc.publisher.none.fl_str_mv |
Universitat Politècnica de Catalunya |
| publisher.none.fl_str_mv |
Universitat Politècnica de Catalunya |
| dc.source.none.fl_str_mv |
reponame:UPCommons. Portal del coneixement obert de la UPC instname:Universitat Politècnica de Catalunya (UPC) |
| instname_str |
Universitat Politècnica de Catalunya (UPC) |
| reponame_str |
UPCommons. Portal del coneixement obert de la UPC |
| collection |
UPCommons. Portal del coneixement obert de la UPC |
| repository.name.fl_str_mv |
|
| repository.mail.fl_str_mv |
|
| _version_ |
1869406171839856640 |
| score |
15,81155 |