Yu Sheng, Zhai Di-Hua, Xia Yuanqing
IEEE Trans Neural Netw Learn Syst. 2024 Aug;35(8):10832-10845. doi: 10.1109/TNNLS.2023.3244186. Epub 2024 Aug 5.
Robotic grasping techniques have been widely studied in recent years. However, it is always a challenging problem for robots to grasp in cluttered scenes. In this issue, objects are placed close to each other, and there is no space around for the robot to place the gripper, making it difficult to find a suitable grasping position. To solve this problem, this article proposes to use the combination of pushing and grasping (PG) actions to help grasp pose detection and robot grasping. We propose a pushing-grasping combined grasping network (GN), PG method based on transformer and convolution (PGTC). For the pushing action, we propose a vision transformer (ViT)-based object position prediction network pushing transformer network (PTNet), which can well capture the global and temporal features and can better predict the position of objects after pushing. To perform the grasping detection, we propose a cross dense fusion network (CDFNet), which can make full use of the RGB image and depth image, and fuse and refine them several times. Compared with previous networks, CDFNet is able to detect the optimal grasping position more accurately. Finally, we use the network for both simulation and actual UR3 robot grasping experiments and achieve SOTA performance. Video and dataset are available at https://youtu.be/Q58YE-Cc250.
近年来,机器人抓取技术得到了广泛研究。然而,对于机器人来说,在杂乱场景中进行抓取始终是一个具有挑战性的问题。在这种情况下,物体彼此靠近放置,机器人周围没有空间来放置夹具,这使得难以找到合适的抓取位置。为了解决这个问题,本文提出使用推抓结合(PG)动作来辅助抓取姿态检测和机器人抓取。我们提出了一种推抓结合的抓取网络(GN),即基于Transformer和卷积的PG方法(PGTC)。对于推动作,我们提出了一种基于视觉Transformer(ViT)的物体位置预测网络——推Transformer网络(PTNet),它能够很好地捕捉全局和时间特征,并且能够更好地预测推之后物体的位置。为了进行抓取检测,我们提出了一种交叉密集融合网络(CDFNet),它可以充分利用RGB图像和深度图像,并对它们进行多次融合和细化。与先前的网络相比,CDFNet能够更准确地检测到最佳抓取位置。最后,我们将该网络用于模拟和实际的UR3机器人抓取实验,并取得了最优的性能。视频和数据集可在https://youtu.be/Q58YE-Cc250获取。