CMFAN：用于少样本单视图3D重建的跨模态特征对齐网络。

Lai Lvlong, Chen Jian, Zhang Zehong, Lin Guosheng, Wu Qingyao

IEEE Trans Neural Netw Learn Syst. 2025 Mar;36(3):5522-5534. doi: 10.1109/TNNLS.2024.3383039. Epub 2025 Feb 28.

Few-shot single-view 3D reconstruction learns to reconstruct the novel category objects based on a query image and a few support shapes. However, since the query image and the support shapes are of different modalities, there is an inherent feature misalignment problem damaging the reconstruction. Previous works in the literature do not consider this problem. To this end, we propose the cross-modal feature alignment network (CMFAN) with two novel techniques. One is a strategy for model pretraining, namely, cross-modal contrastive learning (CMCL), here the 2D images and 3D shapes of the same objects compose the positives, and those from different objects form the negatives. With CMCL, the model learns to embed the 2D and 3D modalities of the same object into a tight area in the feature space and push away those from different objects, thus effectively aligning the global cross-modal features. The other is cross-modal feature fusion (CMFF), which further aligns and fuses the local features. Specifically, it first re-represents the local features with the cross-attention operation, making the local features share more information. Then, CMFF generates a descriptor for the support features and attaches it to each local feature vector of the query image with dense concatenation. Moreover, CMFF can be applied to multilevel local features and brings further advantages. We conduct extensive experiments to evaluate the effectiveness of our designs, and CMFAN sets new state-of-the-art performance in all of the 1-/10-/25-shot tasks of ShapeNet and ModelNet datasets.

少样本单视图3D重建旨在基于一张查询图像和一些支持形状来学习重建新类别的物体。然而，由于查询图像和支持形状具有不同的模态，存在一个固有的特征对齐问题，会损害重建效果。文献中先前的工作并未考虑这个问题。为此，我们提出了具有两种新技术的跨模态特征对齐网络（CMFAN）。一种是模型预训练策略，即跨模态对比学习（CMCL），这里同一物体的2D图像和3D形状构成正样本，而来自不同物体的则构成负样本。通过CMCL，模型学会将同一物体的2D和3D模态嵌入到特征空间中的一个紧密区域，并推开来自不同物体的模态，从而有效地对齐全局跨模态特征。另一种是跨模态特征融合（CMFF），它进一步对齐和融合局部特征。具体来说，它首先通过交叉注意力操作重新表示局部特征，使局部特征共享更多信息。然后，CMFF为支持特征生成一个描述符，并通过密集连接将其附加到查询图像的每个局部特征向量上。此外，CMFF可以应用于多级局部特征，并带来进一步的优势。我们进行了广泛的实验来评估我们设计的有效性，并且CMFAN在ShapeNet和ModelNet数据集的所有1-shot/10-shot/25-shot任务中都创造了新的最优性能。

相似文献

CMFAN: Cross-Modal Feature Alignment Network for Few-Shot Single-View 3D Reconstruction.

IEEE Trans Neural Netw Learn Syst. 2025 Mar;36(3):5522-5534. doi: 10.1109/TNNLS.2024.3383039. Epub 2025 Feb 28.

Single-View 3D Mesh Reconstruction for Seen and Unseen Categories.

IEEE Trans Image Process. 2023;32:3746-3758. doi: 10.1109/TIP.2023.3279661. Epub 2023 Jul 7.

Cross Modal Few-Shot Contextual Transfer for Heterogenous Image Classification.

Front Neurorobot. 2021 May 24;15:654519. doi: 10.3389/fnbot.2021.654519. eCollection 2021.

Learning to Compare Relation: Semantic Alignment for Few-Shot Learning.

IEEE Trans Image Process. 2022;31:1462-1474. doi: 10.1109/TIP.2022.3142530. Epub 2022 Jan 27.

Self-Enhanced Mixed Attention Network for Three-Modal Images Few-Shot Semantic Segmentation.

Sensors (Basel). 2023 Jul 22;23(14):6612. doi: 10.3390/s23146612.

Progressive Cross-Modal Semantic Network for Zero-Shot Sketch-Based Image Retrieval.

IEEE Trans Image Process. 2020 Sep 10;PP. doi: 10.1109/TIP.2020.3020383.

Cross-Modal Collaboration and Robust Feature Classifier for Open-Vocabulary 3D Object Detection.

Sensors (Basel). 2025 Jan 19;25(2):553. doi: 10.3390/s25020553.

Category Alignment Mechanism for Few-Shot Image Classification.

IEEE Trans Neural Netw Learn Syst. 2025 Apr;36(4):7725-7738. doi: 10.1109/TNNLS.2024.3393928. Epub 2025 Apr 4.

Few-Shot Common-Object Reasoning Using Common-Centric Localization Network.

IEEE Trans Image Process. 2021;30:4253-4262. doi: 10.1109/TIP.2021.3070733. Epub 2021 Apr 14.

A mutual reconstruction network model for few-shot classification of histological images: addressing interclass similarity and intraclass diversity.

Quant Imaging Med Surg. 2024 Aug 1;14(8):5443-5459. doi: 10.21037/qims-24-253. Epub 2024 Jul 25.

Suppr 超能文献

核心技术专利：CN118964589B侵权必究

相似文献

CMFAN: Cross-Modal Feature Alignment Network for Few-Shot Single-View 3D Reconstruction.

IEEE Trans Neural Netw Learn Syst. 2025 Mar;36(3):5522-5534. doi: 10.1109/TNNLS.2024.3383039. Epub 2025 Feb 28.

Single-View 3D Mesh Reconstruction for Seen and Unseen Categories.

IEEE Trans Image Process. 2023;32:3746-3758. doi: 10.1109/TIP.2023.3279661. Epub 2023 Jul 7.

Cross Modal Few-Shot Contextual Transfer for Heterogenous Image Classification.

Front Neurorobot. 2021 May 24;15:654519. doi: 10.3389/fnbot.2021.654519. eCollection 2021.

Learning to Compare Relation: Semantic Alignment for Few-Shot Learning.

IEEE Trans Image Process. 2022;31:1462-1474. doi: 10.1109/TIP.2022.3142530. Epub 2022 Jan 27.

Self-Enhanced Mixed Attention Network for Three-Modal Images Few-Shot Semantic Segmentation.

Sensors (Basel). 2023 Jul 22;23(14):6612. doi: 10.3390/s23146612.

Progressive Cross-Modal Semantic Network for Zero-Shot Sketch-Based Image Retrieval.

IEEE Trans Image Process. 2020 Sep 10;PP. doi: 10.1109/TIP.2020.3020383.

Cross-Modal Collaboration and Robust Feature Classifier for Open-Vocabulary 3D Object Detection.

Sensors (Basel). 2025 Jan 19;25(2):553. doi: 10.3390/s25020553.

Category Alignment Mechanism for Few-Shot Image Classification.

IEEE Trans Neural Netw Learn Syst. 2025 Apr;36(4):7725-7738. doi: 10.1109/TNNLS.2024.3393928. Epub 2025 Apr 4.

Few-Shot Common-Object Reasoning Using Common-Centric Localization Network.

IEEE Trans Image Process. 2021;30:4253-4262. doi: 10.1109/TIP.2021.3070733. Epub 2021 Apr 14.

A mutual reconstruction network model for few-shot classification of histological images: addressing interclass similarity and intraclass diversity.

Quant Imaging Med Surg. 2024 Aug 1;14(8):5443-5459. doi: 10.21037/qims-24-253. Epub 2024 Jul 25.

CMFAN: Cross-Modal Feature Alignment Network for Few-Shot Single-View 3D Reconstruction.

作者信息

出版信息

相似文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献