Suppr超能文献

VSFormer:用于多视图3D形状理解的灵活视图集中的相关性挖掘

VSFormer: Mining Correlations in Flexible View Set for Multi-View 3D Shape Understanding.

作者信息

Sun Hongyu, Wang Yongcai, Wang Peng, Deng Haoran, Cai Xudong, Li Deying

出版信息

IEEE Trans Vis Comput Graph. 2025 Apr;31(4):2127-2141. doi: 10.1109/TVCG.2024.3381152. Epub 2025 Feb 27.

Abstract

View-based methods have demonstrated promising performance in 3D shape understanding. However, they tend to make strong assumptions about the relations between views or learn the multi-view correlations indirectly, which limits the flexibility of exploring inter-view correlations and the effectiveness of target tasks. To overcome the above problems, this article investigates flexible organization and explicit correlation learning for multiple views. In particular, we propose to incorporate different views of a 3D shape into a permutation-invariant set, referred to as View Set, which removes rigid relation assumptions and facilitates adequate information exchange and fusion among views. Based on that, we devise a nimble Transformer model, named VSFormer, to explicitly capture pairwise and higher-order correlations of all elements in the set. Meanwhile, we theoretically reveal a natural correspondence between the Cartesian product of a view set and the correlation matrix in the attention mechanism, which supports our model design. Comprehensive experiments suggest that VSFormer has better flexibility, efficient inference efficiency and superior performance. Notably, VSFormer reaches state-of-the-art results on various 3 d recognition datasets, including ModelNet40, ScanObjectNN and RGBD. It also establishes new records on the SHREC'17 retrieval benchmark.

摘要

基于视图的方法在3D形状理解方面已展现出颇具前景的性能。然而,它们往往对视图之间的关系做出很强的假设,或者间接学习多视图相关性,这限制了探索视图间相关性的灵活性以及目标任务的有效性。为克服上述问题,本文研究了多视图的灵活组织和显式相关性学习。具体而言,我们建议将3D形状的不同视图纳入一个置换不变集,称为视图集,它消除了刚性关系假设,并促进了视图之间充分的信息交换和融合。基于此,我们设计了一个灵活的Transformer模型,名为VSFormer,以显式捕获集合中所有元素的成对和高阶相关性。同时,我们从理论上揭示了视图集的笛卡尔积与注意力机制中的相关矩阵之间的自然对应关系,这支持了我们的模型设计。综合实验表明,VSFormer具有更好的灵活性、高效的推理效率和卓越的性能。值得注意的是,VSFormer在各种3D识别数据集上取得了领先的结果,包括ModelNet40、ScanObjectNN和RGBD。它还在SHREC'17检索基准上创造了新记录。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验