Suppr超能文献

CMFAN:用于少样本单视图3D重建的跨模态特征对齐网络。

CMFAN: Cross-Modal Feature Alignment Network for Few-Shot Single-View 3D Reconstruction.

作者信息

Lai Lvlong, Chen Jian, Zhang Zehong, Lin Guosheng, Wu Qingyao

出版信息

IEEE Trans Neural Netw Learn Syst. 2025 Mar;36(3):5522-5534. doi: 10.1109/TNNLS.2024.3383039. Epub 2025 Feb 28.

Abstract

Few-shot single-view 3D reconstruction learns to reconstruct the novel category objects based on a query image and a few support shapes. However, since the query image and the support shapes are of different modalities, there is an inherent feature misalignment problem damaging the reconstruction. Previous works in the literature do not consider this problem. To this end, we propose the cross-modal feature alignment network (CMFAN) with two novel techniques. One is a strategy for model pretraining, namely, cross-modal contrastive learning (CMCL), here the 2D images and 3D shapes of the same objects compose the positives, and those from different objects form the negatives. With CMCL, the model learns to embed the 2D and 3D modalities of the same object into a tight area in the feature space and push away those from different objects, thus effectively aligning the global cross-modal features. The other is cross-modal feature fusion (CMFF), which further aligns and fuses the local features. Specifically, it first re-represents the local features with the cross-attention operation, making the local features share more information. Then, CMFF generates a descriptor for the support features and attaches it to each local feature vector of the query image with dense concatenation. Moreover, CMFF can be applied to multilevel local features and brings further advantages. We conduct extensive experiments to evaluate the effectiveness of our designs, and CMFAN sets new state-of-the-art performance in all of the 1-/10-/25-shot tasks of ShapeNet and ModelNet datasets.

摘要

少样本单视图3D重建旨在基于一张查询图像和一些支持形状来学习重建新类别的物体。然而,由于查询图像和支持形状具有不同的模态,存在一个固有的特征对齐问题,会损害重建效果。文献中先前的工作并未考虑这个问题。为此,我们提出了具有两种新技术的跨模态特征对齐网络(CMFAN)。一种是模型预训练策略,即跨模态对比学习(CMCL),这里同一物体的2D图像和3D形状构成正样本,而来自不同物体的则构成负样本。通过CMCL,模型学会将同一物体的2D和3D模态嵌入到特征空间中的一个紧密区域,并推开来自不同物体的模态,从而有效地对齐全局跨模态特征。另一种是跨模态特征融合(CMFF),它进一步对齐和融合局部特征。具体来说,它首先通过交叉注意力操作重新表示局部特征,使局部特征共享更多信息。然后,CMFF为支持特征生成一个描述符,并通过密集连接将其附加到查询图像的每个局部特征向量上。此外,CMFF可以应用于多级局部特征,并带来进一步的优势。我们进行了广泛的实验来评估我们设计的有效性,并且CMFAN在ShapeNet和ModelNet数据集的所有1-shot/10-shot/25-shot任务中都创造了新的最优性能。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验