CM3 framework for deep multi-agent reinforcement learning in football
Collaboration amongst agents in various multi-agent cooperative and mixed environments has been extensively studied in the field of Deep Multi-Agent Reinforcement Learning. This cooperative behavior and roles emerging out of such cooperation could be beneficial for the agents collectively when they...
| Autor: | |
|---|---|
| Tipo de recurso: | tesis de maestría |
| Fecha de publicación: | 2023 |
| País: | España |
| Institución: | Universitat Politècnica de Catalunya (UPC) |
| Repositorio: | UPCommons. Portal del coneixement obert de la UPC |
| Idioma: | inglés |
| OAI Identifier: | oai:upcommons.upc.edu:2117/407134 |
| Acceso en línea: | https://hdl.handle.net/2117/407134 |
| Access Level: | acceso abierto |
| Palabra clave: | Multiagent systems Reinforcement learning Deep Multi-Agent RL Curriculum Learning Unity ML-Agents Sistemes multiagent Aprenentatge per reforç Àrees temàtiques de la UPC::Informàtica::Intel·ligència artificial |
| Sumario: | Collaboration amongst agents in various multi-agent cooperative and mixed environments has been extensively studied in the field of Deep Multi-Agent Reinforcement Learning. This cooperative behavior and roles emerging out of such cooperation could be beneficial for the agents collectively when they align their individual objectives towards a common goal, share resources effectively, and communicate efficiently to optimize their combined efforts. Research spans across various sub-areas, namely communication in MARL (Comm-MARL), intrinsic rewards, exploration in MARL, curriculum learning, reward shaping, and emergent behavior. Cooperative Multi-Goal Multi-Stage Multi-Agent RL, abbreviated as CM3 is one such framework that uses curriculum learning and a specialized policy function to tackle the issues of efficient exploration and credit assignment respectively. It has been tested on 3 multi-agent environments to demonstrate its power by learning significantly faster than direct adaptations of existing algorithms. As part of this thesis, we have hypothesized if the domain of football from a multi-agent perspective benefits from CM3. Taking notes from the intersection of reinforcement learning and football, and some of the current state-of-the-art football algorithms, such as TiKick and WeKick which are based primarily on PPO, we see how actor-critic algorithms like A2C and PPO compare when used in our multi-agent environment. For this demonstration, we have leveraged a modified version of the Unity ML-Agents' SoccerTwos environment. We also propose an additional enhancement to the original CM3 framework by extending the training further to a 3rd stage when the reward is independent of the goal. We hypothesized that it could enhance coordination because there'd be a single common, collective goal for the team - to win the match - as opposed to the individual goals of scoring or saving. |
|---|