Li Kehan, Zhu Jihua, Cui Zhiming, Chen Xinning, Liu Yang, Wang Fan, Zhao Yue
IEEE Trans Neural Netw Learn Syst. 2025 Apr;36(4):7382-7394. doi: 10.1109/TNNLS.2024.3404276. Epub 2025 Apr 4.
Accurate teeth delineation on 3-D dental models is essential for individualized orthodontic treatment planning. Pioneering works like PointNet suggest a promising direction to conduct efficient and accurate 3-D dental model analyses in end-to-end learnable fashions. Recent studies further imply that multistream architectures to concurrently learn geometric representations from different inputs/views (e.g., coordinates and normals) are beneficial for segmenting teeth with varying conditions. However, such multistream networks typically adopt simple late-fusion strategies to combine features captured from raw inputs that encode complementary but fundamentally different geometric information, potentially hampering their accuracy in end-to-end semantic segmentation. This article presents a hierarchical cross-stream aggregation (HiCA) network to learn more discriminative point/cell-wise representations from multiview inputs for fine-grained 3-D semantic segmentation. Specifically, based upon our multistream backbone with input-tailored feature extractors, we first design a contextual cross-steam aggregation (CA) module conditioned on interstream consistency to boost each view's contextual representation learning jointly. Then, before the late fusion of different streams' outputs for segmentation, we further deploy a discriminative cross-stream aggregation (DA) module to concurrently update all views' discriminative representation learning by leveraging a specific graph attention strategy induced by multiview prototype learning. On both public and in-house datasets of real-patient dental models, our method significantly outperformed state-of-the-art (SOTA) deep learning methods for teeth semantic segmentation. In addition, extended experimental results suggest the applicability of HiCA to other general 3-D shape segmentation tasks. The code is available at https://github.com/ladderlab-xjtu/HiCA.
在三维牙齿模型上进行精确的牙齿轮廓描绘对于个性化正畸治疗计划至关重要。像PointNet这样的开创性工作为以端到端可学习的方式进行高效且精确的三维牙齿模型分析指明了一个有前景的方向。最近的研究进一步表明,多流架构通过同时从不同输入/视图(如坐标和法线)学习几何表示,对于分割不同条件下的牙齿是有益的。然而,这样的多流网络通常采用简单的后期融合策略来组合从原始输入中捕获的特征,这些输入编码了互补但本质上不同的几何信息,这可能会妨碍它们在端到端语义分割中的准确性。本文提出了一种分层跨流聚合(HiCA)网络,用于从多视图输入中学习更具判别力的逐点/逐单元表示,以实现细粒度的三维语义分割。具体而言,基于我们带有输入定制特征提取器的多流主干,我们首先设计了一个基于流间一致性的上下文跨流聚合(CA)模块,以共同促进每个视图的上下文表示学习。然后,在对不同流的输出进行后期融合以进行分割之前,我们进一步部署了一个判别性跨流聚合(DA)模块,通过利用多视图原型学习诱导的特定图注意力策略,同时更新所有视图的判别性表示学习。在真实患者牙齿模型的公共数据集和内部数据集上,我们的方法在牙齿语义分割方面显著优于当前最先进(SOTA)的深度学习方法。此外,扩展的实验结果表明HiCA适用于其他一般的三维形状分割任务。代码可在https://github.com/ladderlab-xjtu/HiCA获取。