Instilling moral value alignment by means of multi-objective reinforcement learning

AI research is being challenged with ensuring that autonomous agents learn to behave ethically, namely in alignment with moral values. Here, we propose a novel way of tackling the value alignment problem as a two-step process. The first step consists on formalising moral values and value aligned beh...

Full description

Bibliographic Details
Authors: Rodriguez Soto, Manel, Serramià Amorós, Marc, López Sánchez, Maite, Rodríguez-Aguilar, Juan A. (Juan Antonio)
Format: article
Status:Published version
Publication Date:2022
Country:España
Institution:Varias* (Consorci de Biblioteques Universitáries de Catalunya, Centre de Serveis Científics i Acadèmics de Catalunya)
Repository:Recercat. Dipósit de la Recerca de Catalunya
OAI Identifier:oai:recercat.cat:2445/192920
Online Access:https://hdl.handle.net/2445/192920
Access Level:Open access
Keyword:Intel·ligència artificial
Aprenentatge per reforç (Intel·ligència artificial)
Ètica
Aspectes morals
Artificial intelligence
Reinforcement learning
Ethics
Moral aspects
id ES_544fdd2a86b248d8df2da0da3f740f5c
oai_identifier_str oai:recercat.cat:2445/192920
network_acronym_str ES
network_name_str España
repository_id_str
spelling Instilling moral value alignment by means of multi-objective reinforcement learningRodriguez Soto, ManelSerramià Amorós, MarcLópez Sánchez, MaiteRodríguez-Aguilar, Juan A. (Juan Antonio)Intel·ligència artificialAprenentatge per reforç (Intel·ligència artificial)ÈticaAspectes moralsArtificial intelligenceReinforcement learningEthicsMoral aspectsAI research is being challenged with ensuring that autonomous agents learn to behave ethically, namely in alignment with moral values. Here, we propose a novel way of tackling the value alignment problem as a two-step process. The first step consists on formalising moral values and value aligned behaviour based on philosophical foundations. Our formalisation is compatible with the framework of (Multi-Objective) Reinforcement Learning, to ease the handling of an agent's individual and ethical objectives. The second step consists in designing an environment wherein an agent learns to behave ethically while pursuing its individual objective. We leverage on our theoretical results to introduce an algorithm that automates our two-step approach. In the cases where value-aligned behaviour is possible, our algorithm produces a learning environment for the agent wherein it will learn a value-aligned behaviour.Springer2023202320222023info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersion17 p.application/pdfhttps://hdl.handle.net/2445/192920Articles publicats en revistes (Matemàtiques i Informàtica)reponame:Recercat. Dipósit de la Recerca de Catalunyainstname:Varias* (Consorci de Biblioteques Universitáries de Catalunya, Centre de Serveis Científics i Acadèmics de Catalunya)InglésReproducció del document publicat a: https://doi.org/10.1007/s10676-022-09635-0Ethics And Information Technology, 2022, vol. 24https://doi.org/10.1007/s10676-022-09635-0cc by (c) Manel Rodríguez Soto et al., 2022http://creativecommons.org/licenses/by/3.0/es/info:eu-repo/semantics/openAccessoai:recercat.cat:2445/1929202026-05-29T05:05:01Z
dc.title.none.fl_str_mv Instilling moral value alignment by means of multi-objective reinforcement learning
title Instilling moral value alignment by means of multi-objective reinforcement learning
spellingShingle Instilling moral value alignment by means of multi-objective reinforcement learning
Rodriguez Soto, Manel
Intel·ligència artificial
Aprenentatge per reforç (Intel·ligència artificial)
Ètica
Aspectes morals
Artificial intelligence
Reinforcement learning
Ethics
Moral aspects
title_short Instilling moral value alignment by means of multi-objective reinforcement learning
title_full Instilling moral value alignment by means of multi-objective reinforcement learning
title_fullStr Instilling moral value alignment by means of multi-objective reinforcement learning
title_full_unstemmed Instilling moral value alignment by means of multi-objective reinforcement learning
title_sort Instilling moral value alignment by means of multi-objective reinforcement learning
dc.creator.none.fl_str_mv Rodriguez Soto, Manel
Serramià Amorós, Marc
López Sánchez, Maite
Rodríguez-Aguilar, Juan A. (Juan Antonio)
author Rodriguez Soto, Manel
author_facet Rodriguez Soto, Manel
Serramià Amorós, Marc
López Sánchez, Maite
Rodríguez-Aguilar, Juan A. (Juan Antonio)
author_role author
author2 Serramià Amorós, Marc
López Sánchez, Maite
Rodríguez-Aguilar, Juan A. (Juan Antonio)
author2_role author
author
author
dc.subject.none.fl_str_mv Intel·ligència artificial
Aprenentatge per reforç (Intel·ligència artificial)
Ètica
Aspectes morals
Artificial intelligence
Reinforcement learning
Ethics
Moral aspects
topic Intel·ligència artificial
Aprenentatge per reforç (Intel·ligència artificial)
Ètica
Aspectes morals
Artificial intelligence
Reinforcement learning
Ethics
Moral aspects
description AI research is being challenged with ensuring that autonomous agents learn to behave ethically, namely in alignment with moral values. Here, we propose a novel way of tackling the value alignment problem as a two-step process. The first step consists on formalising moral values and value aligned behaviour based on philosophical foundations. Our formalisation is compatible with the framework of (Multi-Objective) Reinforcement Learning, to ease the handling of an agent's individual and ethical objectives. The second step consists in designing an environment wherein an agent learns to behave ethically while pursuing its individual objective. We leverage on our theoretical results to introduce an algorithm that automates our two-step approach. In the cases where value-aligned behaviour is possible, our algorithm produces a learning environment for the agent wherein it will learn a value-aligned behaviour.
publishDate 2022
dc.date.none.fl_str_mv 2022
2023
2023
2023
dc.type.none.fl_str_mv info:eu-repo/semantics/article
info:eu-repo/semantics/publishedVersion
format article
status_str publishedVersion
dc.identifier.none.fl_str_mv https://hdl.handle.net/2445/192920
url https://hdl.handle.net/2445/192920
dc.language.none.fl_str_mv Inglés
language_invalid_str_mv Inglés
dc.relation.none.fl_str_mv Reproducció del document publicat a: https://doi.org/10.1007/s10676-022-09635-0
Ethics And Information Technology, 2022, vol. 24
https://doi.org/10.1007/s10676-022-09635-0
dc.rights.none.fl_str_mv cc by (c) Manel Rodríguez Soto et al., 2022
http://creativecommons.org/licenses/by/3.0/es/
info:eu-repo/semantics/openAccess
rights_invalid_str_mv cc by (c) Manel Rodríguez Soto et al., 2022
http://creativecommons.org/licenses/by/3.0/es/
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv 17 p.
application/pdf
dc.publisher.none.fl_str_mv Springer
publisher.none.fl_str_mv Springer
dc.source.none.fl_str_mv Articles publicats en revistes (Matemàtiques i Informàtica)
reponame:Recercat. Dipósit de la Recerca de Catalunya
instname:Varias* (Consorci de Biblioteques Universitáries de Catalunya, Centre de Serveis Científics i Acadèmics de Catalunya)
instname_str Varias* (Consorci de Biblioteques Universitáries de Catalunya, Centre de Serveis Científics i Acadèmics de Catalunya)
reponame_str Recercat. Dipósit de la Recerca de Catalunya
collection Recercat. Dipósit de la Recerca de Catalunya
repository.name.fl_str_mv
repository.mail.fl_str_mv
_version_ 1869408175908716544
score 15.81155