Xiao Yao, Zhuang Yan, Ling Wenwu, Jiang Shouyu, Chen Ke, Liao Guoliang, Xie Yuhua, Hou Yao, Han Lin, Hua Zhan, Luo Yan, Lin Jiangli
College of Biomedical Engineering, Sichuan University, Chengdu, China.
Department of Ultrasound, West China Hospital, Sichuan University, Chengdu, China.
J Appl Clin Med Phys. 2025 Aug;26(8):e70149. doi: 10.1002/acm2.70149.
Thyroid cancer is one of the most common cancers in clinical practice, and accurate classification of thyroid nodule ultrasound images is crucial for computer-aided diagnosis. Models based on a convolutional neural network (CNN) or a transformer struggle to integrate local and global features, which impacts the recognition accuracy.
Our method is designed to capture both the key local fine-grained features and the global spatial features essential for thyroid nodule diagnosis simultaneously. It adapts to the irregular morphology of thyroid nodules, dynamically focuses on the key pixel-level regions of thyroid nodules, and thereby improves the model's recognition accuracy and generalization ability.
The proposed multi-scale fusion model, the local and global feature fusion network (LGF-Net), inspired by the dual-path mechanism of human visual diagnosis, consists of two branches: a CNN branch and a Transformer branch. The CNN branch integrates the wavelet transform and deformable convolution module (WTDCM) to enhance the model's ability to capture discriminative local features and recognize fine-grained textures. By introducing the aggregated attention (AA) mechanism, which mimics biological vision, into the Transformer branch, spatial features are effectively captured. The adaptive feature fusion module (FFM) is then utilized to integrate the multi-scale features of thyroid nodules, further improving classification performance.
We evaluated our model on the public thyroid nodule classification dataset (TNCD) and a private clinical dataset using accuracy, recall, precision, and F1-score. On TNCD, the model achieved 81.50%, 79.51%, 79.92%, and 79.70%, respectively. On the private dataset, it reached 91.24%, 88.90%, 90.73%, and 89.73%, respectively. These results outperformed state-of-the-art methods. We also conducted ablation studies and visualization analysis to validate the model's components and interpretability.
The experiments demonstrate that our method improves the accuracy of thyroid nodule recognition, shows its strong generalization ability and potential for clinical application, and provides interpretability for clinicians' diagnoses.
甲状腺癌是临床实践中最常见的癌症之一,甲状腺结节超声图像的准确分类对于计算机辅助诊断至关重要。基于卷积神经网络(CNN)或Transformer的模型难以整合局部和全局特征,这影响了识别准确率。
我们的方法旨在同时捕捉甲状腺结节诊断所需的关键局部细粒度特征和全局空间特征。它适应甲状腺结节的不规则形态,动态聚焦于甲状腺结节的关键像素级区域,从而提高模型的识别准确率和泛化能力。
所提出的多尺度融合模型,即局部和全局特征融合网络(LGF-Net),受人类视觉诊断的双路径机制启发,由两个分支组成:一个CNN分支和一个Transformer分支。CNN分支集成了小波变换和可变形卷积模块(WTDCM),以增强模型捕捉判别性局部特征和识别细粒度纹理的能力。通过将模仿生物视觉的聚合注意力(AA)机制引入Transformer分支,有效捕捉空间特征。然后利用自适应特征融合模块(FFM)整合甲状腺结节的多尺度特征,进一步提高分类性能。
我们使用准确率、召回率、精确率和F1分数在公共甲状腺结节分类数据集(TNCD)和一个私人临床数据集上评估了我们的模型。在TNCD上,模型分别达到了81.50%、79.51%、79.92%和79.70%。在私人数据集上,分别达到了91.24%、88.90%、90.73%和89.73%。这些结果优于现有方法。我们还进行了消融研究和可视化分析,以验证模型的组件和可解释性。
实验表明,我们的方法提高了甲状腺结节识别的准确率,显示出其强大的泛化能力和临床应用潜力,并为临床医生的诊断提供了可解释性。