Chen Guangyan, Wang Meiling, Zhang Qingxiang, Yuan Li, Yue Yufeng
IEEE Trans Neural Netw Learn Syst. 2024 Oct;35(10):13368-13382. doi: 10.1109/TNNLS.2023.3267333. Epub 2024 Oct 7.
Point cloud registration is an essential technology in computer vision and robotics. Recently, transformer-based methods have achieved advanced performance in point cloud registration by utilizing the advantages of the transformer in order-invariance and modeling dependencies to aggregate information. However, they still suffer from indistinct feature extraction, sensitivity to noise, and outliers, owing to three major limitations: 1) the adoption of CNNs fails to model global relations due to their local receptive fields, resulting in extracted features susceptible to noise; 2) the shallow-wide architecture of transformers and the lack of positional information lead to indistinct feature extraction due to inefficient information interaction; and 3) the insufficient consideration of geometrical compatibility leads to the ambiguous identification of incorrect correspondences. To address the above-mentioned limitations, a novel full transformer network for point cloud registration is proposed, named the deep interaction transformer (DIT), which incorporates: 1) a point cloud structure extractor (PSE) to retrieve structural information and model global relations with the local feature integrator (LFI) and transformer encoders; 2) a deep-narrow point feature transformer (PFT) to facilitate deep information interaction across a pair of point clouds with positional information, such that transformers establish comprehensive associations and directly learn the relative position between points; and 3) a geometric matching-based correspondence confidence evaluation (GMCCE) method to measure spatial consistency and estimate correspondence confidence by the designed triangulated descriptor. Extensive experiments on the ModelNet40, ScanObjectNN, and 3DMatch datasets demonstrate that our method is capable of precisely aligning point clouds, consequently, achieving superior performance compared with state-of-the-art methods. The code is publicly available at https://github.com/CGuangyan-BIT/DIT.
点云配准是计算机视觉和机器人技术中的一项关键技术。近年来,基于Transformer的方法通过利用Transformer在顺序不变性和建模依赖性方面的优势来聚合信息,在点云配准中取得了先进的性能。然而,由于三个主要限制,它们仍然存在特征提取不清晰、对噪声和离群点敏感的问题:1)由于卷积神经网络(CNNs)的局部感受野,其采用无法对全局关系进行建模,导致提取的特征容易受到噪声影响;2)Transformer的浅宽架构以及缺乏位置信息,由于信息交互效率低下,导致特征提取不清晰;3)对几何兼容性的考虑不足导致对错误对应关系的识别模糊。为了解决上述限制,本文提出了一种用于点云配准的新型全Transformer网络,称为深度交互Transformer(DIT),它包含:1)一个点云结构提取器(PSE),用于检索结构信息,并通过局部特征集成器(LFI)和Transformer编码器对全局关系进行建模;2)一个深窄点特征Transformer(PFT),用于通过位置信息促进一对点云之间的深度信息交互,使Transformer建立全面的关联并直接学习点之间的相对位置;3)一种基于几何匹配的对应置信度评估(GMCCE)方法,通过设计的三角测量描述符来测量空间一致性并估计对应置信度。在ModelNet40、ScanObjectNN和3DMatch数据集上进行的大量实验表明,我们的方法能够精确地对齐点云,因此与现有方法相比具有卓越的性能。代码可在https://github.com/CGuangyan-BIT/DIT上公开获取。