Zhang Jiazhe, Zhang Haolin, Jiang Peng, Huang Qin, Zhu Guangya, Chen Jingjing, Cheng Yingling, Ran Shu, Jiang Fusong
College of Engineering Science and Technology, Shanghai Ocean University, Shanghai, 201306, China.
Department of Information, Shanghai Jiao Tong University School of Medicine Affiliated Sixth People's Hospital, Shanghai, 200233, China.
Sci Rep. 2025 Aug 17;15(1):30059. doi: 10.1038/s41598-025-15728-9.
Thyroid cancer is one of the most common types of cancer, pathological diagnosis based on Fine Needle Aspiration Cytology is clinically used as the standard for assessing thyroid cancer. However, the complex structure and large-scale data volume of thyroid pathology images pose challenges in terms of accuracy and efficiency for automatic diagnosis. To address this practical problem, this paper proposes a knowledge distillation method called Multi-Dimensional Knowledge Distillation, which involves feature-based distillation and response-based distillation.We employ a 12-layer Vision Transformer as the teacher model. Feature-based distillation integrates feature information from spatial, channel, and class token, while response-based distillation is achieved through alignment with targets. We integrate information from these diverse dimensions and compress the knowledge into a 3-layer Vision Transformer, which serves as the student model. The student model is trained and evaluated using a dataset containing 22,111 thyroid cytopathological patches. Ultimately, our student model attains a Top-1 classification accuracy of 94.87%. Compared with the teacher model, there is only a 0.55% gap in accuracy, while the computational complexity of the model has decreased by approximately a factor of four. In addition, our method is capable of substantially inheriting the generalization advantages of the teacher model. These results collectively demonstrate the effectiveness of Multi-Dimensional Knowledge Distillation in knowledge transfer.
甲状腺癌是最常见的癌症类型之一,基于细针穿刺细胞学的病理诊断在临床上被用作评估甲状腺癌的标准。然而,甲状腺病理图像的结构复杂且数据量庞大,这给自动诊断的准确性和效率带来了挑战。为了解决这一实际问题,本文提出了一种名为多维知识蒸馏的方法,该方法涉及基于特征的蒸馏和基于响应的蒸馏。我们使用一个12层的视觉Transformer作为教师模型。基于特征的蒸馏整合了来自空间、通道和类别令牌的特征信息,而基于响应的蒸馏则通过与目标对齐来实现。我们整合来自这些不同维度的信息,并将知识压缩到一个3层的视觉Transformer中,该模型作为学生模型。使用包含22,111个甲状腺细胞病理切片的数据集对学生模型进行训练和评估。最终,我们的学生模型达到了94.87%的Top-1分类准确率。与教师模型相比,准确率仅相差0.55%,而模型的计算复杂度降低了约四倍。此外,我们的方法能够充分继承教师模型的泛化优势。这些结果共同证明了多维知识蒸馏在知识转移方面的有效性。