He Shuting, Jiang Xudong, Jiang Wei, Ding Henghui
IEEE Trans Image Process. 2023;32:3199-3211. doi: 10.1109/TIP.2023.3279660. Epub 2023 Jun 7.
In this work, we address the challenging task of few-shot and zero-shot 3D point cloud semantic segmentation. The success of few-shot semantic segmentation in 2D computer vision is mainly driven by the pre-training on large-scale datasets like imagenet. The feature extractor pre-trained on large-scale 2D datasets greatly helps the 2D few-shot learning. However, the development of 3D deep learning is hindered by the limited volume and instance modality of datasets due to the significant cost of 3D data collection and annotation. This results in less representative features and large intra-class feature variation for few-shot 3D point cloud segmentation. As a consequence, directly extending existing popular prototypical methods of 2D few-shot classification/segmentation into 3D point cloud segmentation won't work as well as in 2D domain. To address this issue, we propose a Query-Guided Prototype Adaption (QGPA) module to adapt the prototype from support point clouds feature space to query point clouds feature space. With such prototype adaption, we greatly alleviate the issue of large feature intra-class variation in point cloud and significantly improve the performance of few-shot 3D segmentation. Besides, to enhance the representation of prototypes, we introduce a Self-Reconstruction (SR) module that enables prototype to reconstruct the support mask as well as possible. Moreover, we further consider zero-shot 3D point cloud semantic segmentation where there is no support sample. To this end, we introduce category words as semantic information and propose a semantic-visual projection model to bridge the semantic and visual spaces. Our proposed method surpasses state-of-the-art algorithms by a considerable 7.90% and 14.82% under the 2-way 1-shot setting on S3DIS and ScanNet benchmarks, respectively.
在这项工作中,我们解决了少样本和零样本3D点云语义分割这一具有挑战性的任务。二维计算机视觉中少样本语义分割的成功主要得益于在大规模数据集(如图像网)上的预训练。在大规模二维数据集上预训练的特征提取器极大地有助于二维少样本学习。然而,由于三维数据采集和标注成本高昂,数据集的体积和实例模态有限,阻碍了三维深度学习的发展。这导致少样本三维点云分割的特征代表性较差且类内特征变化较大。因此,直接将现有的流行二维少样本分类/分割原型方法扩展到三维点云分割中,效果不如在二维领域。为了解决这个问题,我们提出了一个查询引导的原型适应(QGPA)模块,将原型从支持点云特征空间适应到查询点云特征空间。通过这种原型适应,我们大大缓解了点云类内特征变化大的问题,并显著提高了少样本三维分割的性能。此外,为了增强原型的表示能力,我们引入了一个自重建(SR)模块,使原型能够尽可能好地重建支持掩码。此外,我们还进一步考虑了没有支持样本的零样本三维点云语义分割。为此,我们引入类别词作为语义信息,并提出了一种语义-视觉投影模型来弥合语义和视觉空间。在S3DIS和ScanNet基准测试的双向单样本设置下,我们提出的方法分别比现有最先进的算法高出7.90%和14.82%。