Reinforcement Learning for Value Alignment

Rodríguez Soto, Manel

Reinforcement Learning for Value Alignment

[eng] As autonomous agents become increasingly sophisticated and we allow them to perform more complex tasks, it is of utmost importance to guarantee that they will act in alignment with human values. This problem has received in the AI literature the name of the value alignment problem. Current app...

Descripción completa

Detalles Bibliográficos
Autor:	Rodríguez Soto, Manel
Tipo de recurso:	tesis doctoral
Estado:	Versión publicada
Fecha de publicación:	2023
País:	España
Institución:	Universidad de Barcelona
Repositorio:	Dipòsit Digital de la UB
OAI Identifier:	oai:diposit.ub.edu:2445/202126
Acceso en línea:	https://hdl.handle.net/2445/202126 http://hdl.handle.net/10803/688998
Access Level:	acceso abierto
Palabra clave:	Intel·ligència artificial Aprenentatge automàtic Aprenentatge per reforç (Intel·ligència artificial) Sistemes multiagent Valors (Filosofia) Artificial intelligence Machine learning Reinforcement learning Multiagent systems Values

Descripción
Sumario:	[eng] As autonomous agents become increasingly sophisticated and we allow them to perform more complex tasks, it is of utmost importance to guarantee that they will act in alignment with human values. This problem has received in the AI literature the name of the value alignment problem. Current approaches apply reinforcement learning to align agents with values due to its recent successes at solving complex sequential decision-making problems. However, they follow an agent-centric approach by expecting that the agent applies the reinforcement learning algorithm correctly to learn an ethical behaviour, without formal guarantees that the learnt ethical behaviour will be ethical. This thesis proposes a novel environment-designer approach for solving the value alignment problem with theoretical guarantees. Our proposed environment-designer approach advances the state of the art with a process for designing ethical environments wherein it is in the agent's best interest to learn ethical behaviours. Our process specifies the ethical knowledge of a moral value in terms that can be used in a reinforcement learning context. Next, our process embeds this knowledge in the agent's learning environment to design an ethical learning environment. The resulting ethical environment incentivises the agent to learn an ethical behaviour while pursuing its own objective. We further contribute to the state of the art by providing a novel algorithm that, following our ethical environment design process, is formally guaranteed to create ethical environments. In other words, this algorithm guarantees that it is in the agent's best interest to learn value- aligned behaviours. We illustrate our algorithm by applying it in a case study environment wherein the agent is expected to learn to behave in alignment with the moral value of respect. In it, a conversational agent is in charge of doing surveys, and we expect it to ask the users questions respectfully while trying to get as much information as possible. In the designed ethical environment, results confirm our theoretical results: the agent learns an ethical behaviour while pursuing its individual objective.

Reinforcement Learning for Value Alignment

Similares en LA Referencia