Li Hao, Zheng Lixin
College of Information Science and Engineering, Huaqiao University, Xiamen 361021, China.
College of Engineering, Huaqiao University, Quanzhou 362021, China.
Sensors (Basel). 2024 Dec 13;24(24):7958. doi: 10.3390/s24247958.
Grasping objects of irregular shapes and various sizes remains a key challenge in the field of robotic grasping. This paper proposes a novel RGB-D data-based grasping pose prediction network, termed Cascaded Feature Fusion Grasping Network (CFFGN), designed for high-efficiency, lightweight, and rapid grasping pose estimation. The network employs innovative structural designs, including depth-wise separable convolutions to reduce parameters and enhance computational efficiency; convolutional block attention modules to augment the model's ability to focus on key features; multi-scale dilated convolution to expand the receptive field and capture multi-scale information; and bidirectional feature pyramid modules to achieve effective fusion and information flow of features at different levels. In tests on the Cornell dataset, our network achieved grasping pose prediction at a speed of 66.7 frames per second, with accuracy rates of 98.6% and 96.9% for image-wise and object-wise splits, respectively. The experimental results show that our method achieves high-speed processing while maintaining high accuracy. In real-world robotic grasping experiments, our method also proved to be effective, achieving an average grasping success rate of 95.6% on a robot equipped with parallel grippers.
抓取形状不规则、大小各异的物体仍然是机器人抓取领域的一项关键挑战。本文提出了一种基于RGB-D数据的新型抓取姿态预测网络,称为级联特征融合抓取网络(CFFGN),旨在实现高效、轻量级和快速的抓取姿态估计。该网络采用了创新的结构设计,包括深度可分离卷积以减少参数并提高计算效率;卷积块注意力模块以增强模型关注关键特征的能力;多尺度扩张卷积以扩大感受野并捕获多尺度信息;以及双向特征金字塔模块以实现不同层次特征的有效融合和信息流。在康奈尔数据集上的测试中,我们的网络以每秒66.7帧的速度实现了抓取姿态预测,在图像级和物体级分割上的准确率分别为98.6%和96.9%。实验结果表明,我们的方法在保持高精度的同时实现了高速处理。在实际的机器人抓取实验中,我们的方法也被证明是有效的,在配备平行夹爪的机器人上实现了95.6%的平均抓取成功率。