Wong Ching-Chang, Tsai Tai-Ting, Ou Can-Kun
Department of Electrical and Computer Engineering, Tamkang University, New Taipei City 25137, Taiwan.
Sensors (Basel). 2024 Aug 20;24(16):5370. doi: 10.3390/s24165370.
This study proposes a method named Hybrid Heuristic Proximal Policy Optimization (HHPPO) to implement online 3D bin-packing tasks. Some heuristic algorithms for bin-packing and the Proximal Policy Optimization (PPO) algorithm of deep reinforcement learning are integrated to implement this method. In the heuristic algorithms for bin-packing, an extreme point priority sorting method is proposed to sort the generated extreme points according to their waste spaces to improve space utilization. In addition, a 3D grid representation of the space status of the container is used, and some partial support constraints are proposed to increase the possibilities for stacking objects and enhance overall space utilization. In the PPO algorithm, some heuristic algorithms are integrated, and the reward function and the action space of the policy network are designed so that the proposed method can effectively complete the online 3D bin-packing task. Some experimental results illustrate that the proposed method has good results in achieving online 3D bin-packing tasks in some simulation environments. In addition, an environment with image vision is constructed to show that the proposed method indeed enables an actual robot manipulator to successfully and effectively complete the bin-packing task in a real environment.
本研究提出了一种名为混合启发式近端策略优化(HHPPO)的方法来实现在线3D装箱任务。该方法集成了一些装箱启发式算法和深度强化学习的近端策略优化(PPO)算法。在装箱启发式算法中,提出了一种极点优先级排序方法,根据生成的极点的浪费空间对其进行排序,以提高空间利用率。此外,使用了容器空间状态的3D网格表示,并提出了一些部分支撑约束,以增加物体堆叠的可能性并提高整体空间利用率。在PPO算法中,集成了一些启发式算法,并设计了奖励函数和策略网络的动作空间,以使所提出的方法能够有效地完成在线3D装箱任务。一些实验结果表明,该方法在某些模拟环境中实现在线3D装箱任务时具有良好的效果。此外,构建了一个具有图像视觉的环境,以表明所提出的方法确实能够使实际的机器人操纵器在真实环境中成功有效地完成装箱任务。