Autonomous Underwater Vehicle Docking Under Realistic Assumptions Using Deep Reinforcement Learning

This paper addresses the challenge of docking an Autonomous Underwater Vehicle (AUV) under realistic conditions. Traditional model-based controllers are often constrained by the complexity and variability of the ocean environment. To overcome these limitations, we propose a Deep Reinforcement Learni...

Full description

Bibliographic Details
Authors: Palomeras Rovira, Narcís, Ridao Rodríguez, Pere
Format: article
Status:Published version
Publication Date:2024
Country:España
Institution:Varias* (Consorci de Biblioteques Universitáries de Catalunya, Centre de Serveis Científics i Acadèmics de Catalunya)
Repository:Recercat. Dipósit de la Recerca de Catalunya
OAI Identifier:oai:recercat.cat:10256/25812
Online Access:http://hdl.handle.net/10256/25812
Access Level:Open access
Keyword:Aprenentatge profund
Deep learning
Aprenentatge automàtic
Machine learning
Aprenentatge per reforç
Reinforcement learning
Vehicles submergibles autònoms
Autonomous underwater vehicles
Vehicles submergibles -- Sistemes de control
Submersibles -- Control systems
Description
Summary:This paper addresses the challenge of docking an Autonomous Underwater Vehicle (AUV) under realistic conditions. Traditional model-based controllers are often constrained by the complexity and variability of the ocean environment. To overcome these limitations, we propose a Deep Reinforcement Learning (DRL) approach to manage the homing and docking maneuver. First, we define the proposed docking task in terms of its observations, actions, and reward function, aiming to bridge the gap between theoretical DRL research and docking algorithms tested on real vehicles. Additionally, we introduce a novel observation space that combines raw noisy observations with filtered data obtained using an Extended Kalman Filter (EKF). We demonstrate the effectiveness of this approach through simulations with various DRL algorithms, showing that the proposed observations can produce stable policies in fewer learning steps, outperforming not only traditional control methods but also policies obtained by the same DRL algorithms in noise-free environments