Exploiting morphological symmetries in offline reinforcement learning

Reinforcement learning has enabled robotic agents to learn complex tasks, from locomotion to manipulation. While this usually requires interaction with the environment, such interaction can be costly or impractical. In these cases, offline reinforcement learning (ORL) allows agents to learn from pre...

Descripción completa

Detalles Bibliográficos
Autor: Lopez Closa, Júlia
Tipo de recurso: tesis de maestría
Fecha de publicación:2025
País:España
Institución:Universitat Politècnica de Catalunya (UPC)
Repositorio:UPCommons. Portal del coneixement obert de la UPC
Idioma:inglés
OAI Identifier:oai:upcommons.upc.edu:2117/445043
Acceso en línea:https://hdl.handle.net/2117/445043
Access Level:acceso abierto
Palabra clave:Group theory
Reinforcement learning
Aprenentatge per reforçament
Robòtica
Teoria de grups
Augment de dades
Xarxa neuronal equivariant
Aprenentatge per reforçament fora de línia
Simetries de MDP
Data augmentation
Equivariant neural network
Offline reinforcement learning
Symmetry
Morphological symmetries
MDP symmetries
Grups, Teoria de
Aprenentatge per reforç
Àrees temàtiques de la UPC::Informàtica::Intel·ligència artificial::Aprenentatge automàtic
id ES_398fc57338231be5e6a29dcbb40d94db
oai_identifier_str oai:upcommons.upc.edu:2117/445043
network_acronym_str ES
network_name_str España
repository_id_str
spelling Exploiting morphological symmetries in offline reinforcement learningLopez Closa, JúliaGroup theoryReinforcement learningAprenentatge per reforçamentRobòticaTeoria de grupsAugment de dadesXarxa neuronal equivariantAprenentatge per reforçament fora de líniaSimetries de MDPReinforcement learningGroup theoryData augmentationEquivariant neural networkOffline reinforcement learningSymmetryMorphological symmetriesMDP symmetriesMDP symmetriesMDP symmetriesGrups, Teoria deAprenentatge per reforçÀrees temàtiques de la UPC::Informàtica::Intel·ligència artificial::Aprenentatge automàticReinforcement learning has enabled robotic agents to learn complex tasks, from locomotion to manipulation. While this usually requires interaction with the environment, such interaction can be costly or impractical. In these cases, offline reinforcement learning (ORL) allows agents to learn from pre-collected data instead. However, this paradigm introduces challenges such as extrapolation error and the inability to explore beyond the dataset, making sufficient and diverse data essential. Many robots also exhibit structural regularities that preserve system dynamics under transformations. We refer to these as morphological symmetries, which can be formalized with group theory, applied with representation theory, and interpreted as symmetries of the underlying MDP. In this thesis, we explore how exploiting morphological symmetries can improve data efficiency, motion consistency, and generalization in ORL. Specifically, we investigate two complementary approaches: (1) data augmentation via symmetry transformations and (2) equivariant neural architectures based on invariant and equivariant MLPs. We evaluate their performance across multiple robotic environments and datasets of varying quality, and propose an extension of TD3+BC, RAISymE(TD3+BC), that mitigates mean-seeking behavior arising from dataset multimodality introduced through symmetry-based augmentation. Our results show that, when the behavior policy induces an overlapping support across symmetric regions of the state space, exploiting morphological symmetries leads to consistent performance gains in data-scarce scenarios and promotes more symmetric policies.Universitat Politècnica de CatalunyaMartín Muñoz, Mario20252025-10-2220252025-10-30master thesishttp://purl.org/coar/resource_type/c_bdccNAhttp://purl.org/coar/version/c_be7fb7dd8ff6fe43info:eu-repo/semantics/masterThesisapplication/pdfhttps://hdl.handle.net/2117/445043reponame:UPCommons. Portal del coneixement obert de la UPCinstname:Universitat Politècnica de Catalunya (UPC)Inglésengopen accesshttp://purl.org/coar/access_right/c_abf2info:eu-repo/semantics/openAccessoai:upcommons.upc.edu:2117/4450432026-05-27T15:37:01Z
dc.title.none.fl_str_mv Exploiting morphological symmetries in offline reinforcement learning
title Exploiting morphological symmetries in offline reinforcement learning
spellingShingle Exploiting morphological symmetries in offline reinforcement learning
Lopez Closa, Júlia
Group theory
Reinforcement learning
Aprenentatge per reforçament
Robòtica
Teoria de grups
Augment de dades
Xarxa neuronal equivariant
Aprenentatge per reforçament fora de línia
Simetries de MDP
Reinforcement learning
Group theory
Data augmentation
Equivariant neural network
Offline reinforcement learning
Symmetry
Morphological symmetries
MDP symmetries
MDP symmetries
MDP symmetries
Grups, Teoria de
Aprenentatge per reforç
Àrees temàtiques de la UPC::Informàtica::Intel·ligència artificial::Aprenentatge automàtic
title_short Exploiting morphological symmetries in offline reinforcement learning
title_full Exploiting morphological symmetries in offline reinforcement learning
title_fullStr Exploiting morphological symmetries in offline reinforcement learning
title_full_unstemmed Exploiting morphological symmetries in offline reinforcement learning
title_sort Exploiting morphological symmetries in offline reinforcement learning
dc.creator.none.fl_str_mv Lopez Closa, Júlia
author Lopez Closa, Júlia
author_facet Lopez Closa, Júlia
author_role author
dc.contributor.none.fl_str_mv Martín Muñoz, Mario
dc.subject.none.fl_str_mv Group theory
Reinforcement learning
Aprenentatge per reforçament
Robòtica
Teoria de grups
Augment de dades
Xarxa neuronal equivariant
Aprenentatge per reforçament fora de línia
Simetries de MDP
Reinforcement learning
Group theory
Data augmentation
Equivariant neural network
Offline reinforcement learning
Symmetry
Morphological symmetries
MDP symmetries
MDP symmetries
MDP symmetries
Grups, Teoria de
Aprenentatge per reforç
Àrees temàtiques de la UPC::Informàtica::Intel·ligència artificial::Aprenentatge automàtic
topic Group theory
Reinforcement learning
Aprenentatge per reforçament
Robòtica
Teoria de grups
Augment de dades
Xarxa neuronal equivariant
Aprenentatge per reforçament fora de línia
Simetries de MDP
Reinforcement learning
Group theory
Data augmentation
Equivariant neural network
Offline reinforcement learning
Symmetry
Morphological symmetries
MDP symmetries
MDP symmetries
MDP symmetries
Grups, Teoria de
Aprenentatge per reforç
Àrees temàtiques de la UPC::Informàtica::Intel·ligència artificial::Aprenentatge automàtic
description Reinforcement learning has enabled robotic agents to learn complex tasks, from locomotion to manipulation. While this usually requires interaction with the environment, such interaction can be costly or impractical. In these cases, offline reinforcement learning (ORL) allows agents to learn from pre-collected data instead. However, this paradigm introduces challenges such as extrapolation error and the inability to explore beyond the dataset, making sufficient and diverse data essential. Many robots also exhibit structural regularities that preserve system dynamics under transformations. We refer to these as morphological symmetries, which can be formalized with group theory, applied with representation theory, and interpreted as symmetries of the underlying MDP. In this thesis, we explore how exploiting morphological symmetries can improve data efficiency, motion consistency, and generalization in ORL. Specifically, we investigate two complementary approaches: (1) data augmentation via symmetry transformations and (2) equivariant neural architectures based on invariant and equivariant MLPs. We evaluate their performance across multiple robotic environments and datasets of varying quality, and propose an extension of TD3+BC, RAISymE(TD3+BC), that mitigates mean-seeking behavior arising from dataset multimodality introduced through symmetry-based augmentation. Our results show that, when the behavior policy induces an overlapping support across symmetric regions of the state space, exploiting morphological symmetries leads to consistent performance gains in data-scarce scenarios and promotes more symmetric policies.
publishDate 2025
dc.date.none.fl_str_mv 2025
2025-10-22
2025
2025-10-30
dc.type.none.fl_str_mv master thesis
http://purl.org/coar/resource_type/c_bdcc
NA
http://purl.org/coar/version/c_be7fb7dd8ff6fe43
dc.type.openaire.fl_str_mv info:eu-repo/semantics/masterThesis
format masterThesis
dc.identifier.none.fl_str_mv https://hdl.handle.net/2117/445043
url https://hdl.handle.net/2117/445043
dc.language.none.fl_str_mv Inglés
eng
language_invalid_str_mv Inglés
language eng
dc.rights.none.fl_str_mv open access
http://purl.org/coar/access_right/c_abf2
dc.rights.openaire.fl_str_mv info:eu-repo/semantics/openAccess
rights_invalid_str_mv open access
http://purl.org/coar/access_right/c_abf2
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.publisher.none.fl_str_mv Universitat Politècnica de Catalunya
publisher.none.fl_str_mv Universitat Politècnica de Catalunya
dc.source.none.fl_str_mv reponame:UPCommons. Portal del coneixement obert de la UPC
instname:Universitat Politècnica de Catalunya (UPC)
instname_str Universitat Politècnica de Catalunya (UPC)
reponame_str UPCommons. Portal del coneixement obert de la UPC
collection UPCommons. Portal del coneixement obert de la UPC
repository.name.fl_str_mv
repository.mail.fl_str_mv
_version_ 1869406171839856640
score 15,81155