基于强化学习的旅行商问题算法在hTetran - A多角形启发的自重构平铺机器人覆盖路径规划中的应用

Coverage Path Planning Using Reinforcement Learning-Based TSP for hTetran-A Polyabolo-Inspired Self-Reconfigurable Tiling Robot.

作者信息

Le Anh Vu, Veerajagadheswar Prabakaran, Thiha Kyaw Phone, Elara Mohan Rajesh, Nhan Nguyen Huu Khanh

机构信息

ROAR Lab, Engineering Product Development, Singapore University of Technology and Design, Singapore 487372, Singapore.

Optoelectronics Research Group, Faculty of Electrical and Electronics Engineering, Ton Duc Thang University, Ho Chi Minh City 700000, Vietnam.

出版信息

Sensors (Basel). 2021 Apr 7;21(8):2577. doi: 10.3390/s21082577.

DOI:10.3390/s21082577

PMID:33916995

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8067765/

Abstract

One of the critical challenges in deploying the cleaning robots is the completion of covering the entire area. Current tiling robots for area coverage have fixed forms and are limited to cleaning only certain areas. The reconfigurable system is the creative answer to such an optimal coverage problem. The tiling robot's goal enables the complete coverage of the entire area by reconfiguring to different shapes according to the area's needs. In the particular sequencing of navigation, it is essential to have a structure that allows the robot to extend the coverage range while saving energy usage during navigation. This implies that the robot is able to cover larger areas entirely with the least required actions. This paper presents a complete path planning (CPP) for hTetran, a polyabolo tiled robot, based on a TSP-based reinforcement learning optimization. This structure simultaneously produces robot shapes and sequential trajectories whilst maximizing the reward of the trained reinforcement learning (RL) model within the predefined polyabolo-based tileset. To this end, a reinforcement learning-based travel sales problem (TSP) with proximal policy optimization (PPO) algorithm was trained using the complementary learning computation of the TSP sequencing. The reconstructive results of the proposed RL-TSP-based CPP for hTetran were compared in terms of energy and time spent with the conventional tiled hypothetical models that incorporate TSP solved through an evolutionary based ant colony optimization (ACO) approach. The CPP demonstrates an ability to generate an ideal Pareto optima trajectory that enhances the robot's navigation inside the real environment with the least energy and time spent in the company of conventional techniques.

摘要

部署清洁机器人的关键挑战之一是完成对整个区域的覆盖。当前用于区域覆盖的平铺机器人具有固定的形式，并且仅限于清洁某些特定区域。可重构系统是解决此类最优覆盖问题的创新方案。平铺机器人的目标是通过根据区域需求重新配置成不同形状来实现对整个区域的完全覆盖。在特定的导航顺序中，拥有一种结构至关重要，这种结构能使机器人在导航过程中扩展覆盖范围的同时节省能源消耗。这意味着机器人能够以最少的必要动作完全覆盖更大的区域。本文基于基于旅行商问题（TSP）的强化学习优化，为多连块平铺机器人hTetran提出了一种完整路径规划（CPP）。这种结构在预定义的基于多连块的瓦片集中，同时生成机器人形状和连续轨迹，同时最大化训练后的强化学习（RL）模型的奖励。为此，使用TSP排序的互补学习计算，训练了一种基于近端策略优化（PPO）算法的基于强化学习的旅行商问题（TSP）。将所提出的基于RL-TSP的hTetran CPP的重构结果在能量和时间消耗方面与通过基于进化的蚁群优化（ACO）方法解决TSP的传统平铺假设模型进行了比较。CPP展示了生成理想帕累托最优轨迹的能力，该轨迹能以最少的能量和时间，在传统技术的陪伴下增强机器人在真实环境中的导航能力。