Hossain Khondker Fariha, Kamran Sharif Amit, Ong Joshua, Tavakkoli Alireza
Department of Computer Science and Engineering, University of Nevada, Reno, Reno, NV, 89557, USA.
Department of Ophthalmology and Visual Sciences, University of Michigan Kellogg Eye Center, Ann Arbor, MI, USA.
Sci Rep. 2025 May 7;15(1):15948. doi: 10.1038/s41598-025-91430-0.
The rapid evolution of deep learning has dramatically enhanced the field of medical image segmentation, leading to the development of models with unprecedented accuracy in analyzing complex medical images. Deep learning-based segmentation holds significant promise for advancing clinical care and enhancing the precision of medical interventions. However, these models' high computational demand and complexity present significant barriers to their application in resource-constrained clinical settings. To address this challenge, we introduce Teach-Former, a novel knowledge distillation (KD) framework that leverages a Transformer backbone to effectively condense the knowledge of multiple teacher models into a single, streamlined student model. Moreover, it excels in the contextual and spatial interpretation of relationships across multimodal images for more accurate and precise segmentation. Teach-Former stands out by harnessing multimodal inputs (CT, PET, MRI) and distilling the final predictions and the intermediate attention maps, ensuring a richer spatial and contextual knowledge transfer. Through this technique, the student model inherits the capacity for fine segmentation while operating with a significantly reduced parameter set and computational footprint. Additionally, introducing a novel training strategy optimizes knowledge transfer, ensuring the student model captures the intricate mapping of features essential for high-fidelity segmentation. The efficacy of Teach-Former has been effectively tested on two extensive multimodal datasets, HECKTOR21 and PI-CAI22, encompassing various image types. The results demonstrate that our KD strategy reduces the model complexity and surpasses existing state-of-the-art methods to achieve superior performance. The findings of this study indicate that the proposed methodology could facilitate efficient segmentation of complex multimodal medical images, supporting clinicians in achieving more precise diagnoses and comprehensive monitoring of pathological conditions ( https://github.com/FarihaHossain/TeachFormer ).
深度学习的快速发展极大地推动了医学图像分割领域的进步,促使在分析复杂医学图像方面出现了具有前所未有的准确性的模型。基于深度学习的分割在推进临床护理和提高医疗干预的精度方面具有巨大潜力。然而,这些模型对计算的高要求和复杂性给它们在资源受限的临床环境中的应用带来了重大障碍。为应对这一挑战,我们引入了Teach-Former,这是一种新颖的知识蒸馏(KD)框架,它利用Transformer主干有效地将多个教师模型的知识浓缩到一个单一的、简化的学生模型中。此外,它在跨多模态图像的关系的上下文和空间解释方面表现出色,以实现更准确和精确的分割。Teach-Former通过利用多模态输入(CT、PET、MRI)并蒸馏最终预测和中间注意力图而脱颖而出,确保更丰富的空间和上下文知识转移。通过这种技术,学生模型在以显著减少的参数集和计算量运行的同时,继承了精细分割的能力。此外,引入一种新颖的训练策略优化了知识转移,确保学生模型捕捉到高保真分割所需的复杂特征映射。Teach-Former的有效性已在两个广泛的多模态数据集HECKTOR21和PI-CAI22上进行了有效测试,这些数据集涵盖了各种图像类型。结果表明,我们的KD策略降低了模型复杂性,并超越了现有的最先进方法,以实现卓越的性能。这项研究的结果表明,所提出的方法可以促进复杂多模态医学图像的高效分割,支持临床医生实现更精确的诊断和对病理状况的全面监测(https://github.com/FarihaHossain/TeachFormer)。