T2TD：基于先验知识引导的文本到3D生成模型。

T2TD: Text-3D Generation Model Based on Prior Knowledge Guidance.

作者信息

Nie Weizhi, Chen Ruidong, Wang Weijie, Lepri Bruno, Sebe Nicu

出版信息

IEEE Trans Pattern Anal Mach Intell. 2025 Jan;47(1):172-189. doi: 10.1109/TPAMI.2024.3463753. Epub 2024 Dec 4.

DOI:10.1109/TPAMI.2024.3463753

Abstract

In recent years, 3D models have been utilized in many applications, such as auto-drivers, 3D reconstruction, VR, and AR. However, the scarcity of 3D model data does not meet its practical demands. Thus, generating high-quality 3D models efficiently from textual descriptions is a promising but challenging way to solve this problem. In this paper, inspired by the creative mechanisms of human imagination, which concretely supplement the target model from ambiguous descriptions built upon human experiential knowledge, we propose a novel text-3D generation model (T2TD). T2TD aims to generate the target model based on the textual description with the aid of experiential knowledge. Its target creation process simulates the imaginative mechanisms of human beings. In this process, we first introduce the text-3D knowledge graph to preserve the relationship between 3D models and textual semantic information, which provides related shapes like humans' experiential information. Second, we propose an effective causal inference model to select useful feature information from these related shapes, which can remove the unrelated structure information and only retain solely the feature information strongly related to the textual description. Third, we adopt a novel multi-layer transformer structure to progressively fuse this strongly related structure information and textual information, compensating for the lack of structural information, and enhancing the final performance of the 3D generation model. The final experimental results demonstrate that our approach significantly improves 3D model generation quality and outperforms the SOTA methods on the text2shape datasets.

摘要

近年来，3D模型已被应用于许多领域，如自动驾驶、3D重建、虚拟现实（VR）和增强现实（AR）。然而，3D模型数据的稀缺性无法满足其实际需求。因此，从文本描述中高效生成高质量的3D模型是解决这一问题的一种有前景但具有挑战性的方法。在本文中，受人类想象力的创作机制启发，具体而言，是从基于人类经验知识构建的模糊描述中补充目标模型，我们提出了一种新颖的文本到3D生成模型（T2TD）。T2TD旨在借助经验知识，根据文本描述生成目标模型。其目标创建过程模拟了人类的想象机制。在此过程中，我们首先引入文本-3D知识图谱，以保留3D模型与文本语义信息之间的关系，这提供了类似人类经验信息的相关形状。其次，我们提出了一种有效的因果推理模型，从这些相关形状中选择有用的特征信息，该模型可以去除不相关的结构信息，仅保留与文本描述密切相关的特征信息。第三，我们采用一种新颖的多层Transformer结构，逐步融合这种密切相关的结构信息和文本信息，弥补结构信息的不足，并提高3D生成模型的最终性能。最终的实验结果表明，我们的方法显著提高了3D模型的生成质量，在text2shape数据集上优于当前最优方法。

相似文献

T2TD: Text-3D Generation Model Based on Prior Knowledge Guidance.T2TD：基于先验知识引导的文本到3D生成模型。

IEEE Trans Pattern Anal Mach Intell. 2025 Jan;47(1):172-189. doi: 10.1109/TPAMI.2024.3463753. Epub 2024 Dec 4.

Path-based knowledge reasoning with textual semantic information for medical knowledge graph completion.基于路径的知识推理与文本语义信息融合的医疗知识图谱补全方法

BMC Med Inform Decis Mak. 2021 Nov 29;21(Suppl 9):335. doi: 10.1186/s12911-021-01622-7.

TeKo: Text-Rich Graph Neural Networks With External Knowledge.TeKo：具有外部知识的文本丰富图神经网络

IEEE Trans Neural Netw Learn Syst. 2024 Oct;35(10):14699-14711. doi: 10.1109/TNNLS.2023.3281354. Epub 2024 Oct 7.

Supporting vision-language model few-shot inference with confounder-pruned knowledge prompt.通过混杂因素修剪知识提示支持视觉语言模型少样本推理。

Neural Netw. 2025 May;185:107173. doi: 10.1016/j.neunet.2025.107173. Epub 2025 Jan 18.

TAMC: Textual Alignment and Masked Consistency for Open-Vocabulary 3D Scene Understanding.TAMC：用于开放词汇3D场景理解的文本对齐与掩码一致性

Sensors (Basel). 2024 Sep 24;24(19):6166. doi: 10.3390/s24196166.

Make-Your-Video: Customized Video Generation Using Textual and Structural Guidance.

IEEE Trans Vis Comput Graph. 2025 Feb;31(2):1526-1541. doi: 10.1109/TVCG.2024.3365804. Epub 2025 Jan 6.

Incorporating Domain Knowledge Into Language Models by Using Graph Convolutional Networks for Assessing Semantic Textual Similarity: Model Development and Performance Comparison.通过使用图卷积网络将领域知识融入语言模型以评估语义文本相似度：模型开发与性能比较

JMIR Med Inform. 2021 Nov 26;9(11):e23101. doi: 10.2196/23101.

Pathology report generation from whole slide images with knowledge retrieval and multi-level regional feature selection.基于知识检索和多级区域特征选择从全切片图像生成病理报告

Comput Methods Programs Biomed. 2025 May;263:108677. doi: 10.1016/j.cmpb.2025.108677. Epub 2025 Feb 27.

Text-guided small molecule generation via diffusion model.通过扩散模型进行文本引导的小分子生成。

iScience. 2024 Sep 19;27(11):110992. doi: 10.1016/j.isci.2024.110992. eCollection 2024 Nov 15.

Text2NeRF: Text-Driven 3D Scene Generation With Neural Radiance Fields.Text2NeRF：基于神经辐射场的文本驱动3D场景生成

IEEE Trans Vis Comput Graph. 2024 Dec;30(12):7749-7762. doi: 10.1109/TVCG.2024.3361502. Epub 2024 Oct 28.

引用本文的文献

Can China's national fitness policy contribute to achieving universal health? Analysis based on the three-dimensional framework.中国的全民健身政策能否助力实现全民健康？基于三维框架的分析

Front Public Health. 2025 Aug 18;13:1610070. doi: 10.3389/fpubh.2025.1610070. eCollection 2025.

AttnW2V-Enhancer: Leveraging attention and Word2Vec for enhanced enhancer prediction.注意力加权词向量增强器：利用注意力机制和词向量进行增强的增强子预测。

Comput Struct Biotechnol J. 2025 Jul 23;27:3275-3284. doi: 10.1016/j.csbj.2025.07.008. eCollection 2025.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

T2TD：基于先验知识引导的文本到3D生成模型。

T2TD: Text-3D Generation Model Based on Prior Knowledge Guidance.

作者信息

出版信息

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献