ISSN :2582-9793

Deep Reinforcement Learning for Autonomous Ground Vehicle Exploration Without A-Priori Maps

Original Research (Published On: 24-Jun-2023 )
Deep Reinforcement Learning for Autonomous Ground Vehicle Exploration Without A-Priori Maps
DOI : 10.54364/AAIML.2023.1170

Shathushan Sivashangaran and Azim Eskandarian

Adv. Artif. Intell. Mach. Learn., 3 (2):1198-1219

Shathushan Sivashangaran : Virginia Tech

Azim Eskandarian : Virginia Tech

Download PDF Here

DOI: 10.54364/AAIML.2023.1170

Article History: Received on: 18-May-23, Accepted on: 22-Jun-23, Published on: 24-Jun-23

Corresponding Author: Shathushan Sivashangaran

Email: shathushansiva@vt.edu

Citation: Shathushan Sivashangaran, Azim Eskandarian (2023). Deep Reinforcement Learning for Autonomous Ground Vehicle Exploration Without A-Priori Maps. Adv. Artif. Intell. Mach. Learn., 3 (2 ):1198-1219

          

Abstract

    Autonomous Ground Vehicles (AGVs) are essential tools for a wide range of applications stemming from their ability to operate in hazardous environments with minimal human operator input. Effective motion planning is paramount for successful operation of AGVs. Conventional motion planning algorithms are dependent on prior knowledge of environment characteristics and offer limited utility in information poor, dynamically altering environments such as areas where emergency hazards like fire and earthquake occur, and unexplored subterranean environments such as tunnels and lava tubes on Mars. We propose a Deep Reinforcement Learning (DRL) framework for intelligent AGV exploration without a-priori maps utilizing Actor-Critic DRL algorithms to learn policies in continuous and high-dimensional action spaces directly from raw sensor data. The DRL architecture comprises feedforward neural networks for the critic and actor representations in which the actor network strategizes linear and angular velocity control actions given current state inputs, that are evaluated by the critic network which learns and estimates Q-values to maximize an accumulated reward. Three off-policy DRL algorithms, DDPG, TD3 and SAC, are trained and compared in two environments of varying complexity, and further evaluated in a third with no prior training or knowledge of map characteristics. The agent is shown to learn optimal policies at the end of each training period to chart quick, collision-free exploration trajectories, and is extensible, capable of adapting to an unknown environment without changes to network architecture or hyperparameters. The best algorithm is further evaluated in a realistic 3D environment. 

Statistics

   Article View: 771
   PDF Downloaded: 26