Liu Shunlan, Yang Yang, Cai Mingli, Xu Zhirong, He Shaozheng, Su Qichen, Liu Peizhong, Lyu Guorong
Department of Ultrasound Medicine, The Second Affiliated Hospital of Fujian Medical University, Quanzhou, China.
College of Engineering, Huaqiao University, Quanzhou, China.
Endocrine. 2025 May 23. doi: 10.1007/s12020-025-04278-9.
PURPOSE: We aimed to evaluate a human-machine collaborative risk assessment model for thyroid nodules using local attention mechanisms and multi-scale feature extraction and compare its performance with those of radiologists of varying experience levels. METHODS: A multi-center diagnostic study was conducted using ultrasound image datasets from six hospitals in China. The model was trained on 8397 images from 8063 patients (training set) and validated on 253 images from 245 patients across multiple centers. The diagnostic performance of the model was compared with those of radiologists with varying levels of experience. An assistive strategy was developed where radiologists adjusted their diagnoses based on model results. RESULTS: The model achieved recognition accuracies of 0.966, 0.809, 0.826, 0.837, and 0.861 for composition, echogenicity, margin, echogenic foci, and orientation, respectively. The area under the receiver operating characteristic curve (AUROC) for the model in diagnosing benign and malignant nodules was 0.882, significantly higher than that of the junior radiologist (0.789; P < 0.0001). The AUROC of the model was between that of the intermediate (0.837) and senior (0.892) radiologists, with no significant difference compared to either group (both P > 0.05). The assistive strategy improved the AUROC for the junior radiologist from 0.789 to 0.859 (P < 0.0001) and increased sensitivity from 66.11% to 80.00% (P < 0.05), with specificity unchanged. CONCLUSION: The model accurately identified thyroid nodule risk features and enhanced diagnostic performance, particularly for the junior radiologist, improving sensitivity in diagnosing thyroid nodules.
目的:我们旨在评估一种使用局部注意力机制和多尺度特征提取的甲状腺结节人机协作风险评估模型,并将其性能与不同经验水平的放射科医生的性能进行比较。 方法:使用来自中国六家医院的超声图像数据集进行了一项多中心诊断研究。该模型在来自8063名患者的8397张图像(训练集)上进行训练,并在来自多个中心的245名患者的253张图像上进行验证。将该模型的诊断性能与不同经验水平的放射科医生的诊断性能进行比较。制定了一种辅助策略,即放射科医生根据模型结果调整他们的诊断。 结果:该模型在成分、回声、边界、回声灶和方向方面的识别准确率分别为0.966、0.809、0.826、0.837和0.861。该模型诊断良性和恶性结节的受试者操作特征曲线下面积(AUROC)为0.882,显著高于初级放射科医生(0.789;P < 0.0001)。该模型的AUROC介于中级(0.837)和高级(0.892)放射科医生之间,与两组相比均无显著差异(均P > 0.05)。辅助策略将初级放射科医生的AUROC从0.789提高到0.859(P < 0.0001),并将敏感性从66.11%提高到80.00%(P < 0.05),特异性不变。 结论:该模型准确识别了甲状腺结节的风险特征并提高了诊断性能,特别是对于初级放射科医生,提高了甲状腺结节诊断的敏感性。
J Clin Endocrinol Metab. 2024-10-3
Rev Endocr Metab Disord. 2024-2