Ni Jia-Hui, Liu Yun-Yun, Chen Chao, Shi Yi-Lei, Zhao Xing, Li Xiao-Long, Ye Bei-Bei, Hu Jing-Liang, Mou Li-Chao, Sun Li-Ping, Fu Hui-Jun, Zhu Xiao-Xiang, Zhang Yi-Feng, Guo Lehang, Xu Hui-Xiong
Department of Medical Ultrasound, Shanghai Tenth People's Hospital, Tongji University School of Medicine, YanChang Middle Street 301, Shanghai, China, 86 21-66307539.
Department of Thyroid Surgery, Sichuan Provincial People's Hospital, University of Electronic Science and Technology of China, Chengdu, China.
JMIR Med Inform. 2025 Jul 30;13:e71740. doi: 10.2196/71740.
Most artificial intelligence (AI) models for thyroid nodules are designed to screen for malignancy to guide further interventions; however, these models have not yet been fully implemented in clinical practice.
This study aimed to evaluate AI in real clinical settings for identifying potentially benign thyroid nodules initially deemed to be at risk for malignancy by radiologists, reducing unnecessary fine needle aspiration (FNA) and optimizing management.
We retrospectively collected a validation cohort of thyroid nodules that had undergone FNA. These nodules were initially assessed as "suspicious for malignancy" by radiologists based on ultrasound features, following standard clinical practice, which prompted further FNA procedures. Ultrasound images of these nodules were re-evaluated using a deep learning-based AI system, and its diagnostic performance was assessed in terms of correct identification of benign nodules and error identification of malignant nodules. Performance metrics such as sensitivity, specificity, and the area under the receiver operating characteristic curve were calculated. In addition, a separate comparison cohort was retrospectively assembled to compare the AI system's ability to correctly identify benign thyroid nodules with that of radiologists.
The validation cohort comprised 4572 thyroid nodules (benign: n=3134, 68.5%; malignant: n=1438, 31.5%). AI correctly identified 2719 (86.8% among benign nodules) and reduced unnecessary FNAs from 68.5% (3134/4572) to 9.1% (415/4572). However, 123 malignant nodules (8.6% of malignant cases) were mistakenly identified as benign, with the majority of these being of low or intermediate suspicion. In the comparison cohort, AI successfully identified 81.4% (96/118) of benign nodules. It outperformed junior and senior radiologists, who identified only 40% and 55%, respectively. The area under the curve (AUC) for the AI model was 0.88 (95% CI 0.85-0.91), demonstrating a superior AUC compared with that of the junior radiologists (AUC=0.43, 95% CI 0.36-0.50; P=.002) and senior radiologists (AUC=0.63, 95% CI 0.55-0.70; P=.003).
Compared with radiologists, AI can better serve as a "goalkeeper" in reducing unnecessary FNAs by identifying benign nodules that are initially assessed as malignant by radiologists. However, active surveillance is still necessary for all these nodules since a very small number of low-aggressiveness malignant nodules may be mistakenly identified.
大多数用于甲状腺结节的人工智能(AI)模型旨在筛查恶性肿瘤以指导进一步干预;然而,这些模型尚未在临床实践中得到充分应用。
本研究旨在评估AI在实际临床环境中识别放射科医生最初认为有恶性风险的潜在良性甲状腺结节的能力,减少不必要的细针穿刺抽吸活检(FNA)并优化管理。
我们回顾性收集了一组接受过FNA的甲状腺结节验证队列。按照标准临床实践,根据超声特征,这些结节最初被放射科医生评估为“可疑恶性”,这促使进一步进行FNA程序。使用基于深度学习的AI系统对这些结节的超声图像进行重新评估,并根据正确识别良性结节和错误识别恶性结节的情况评估其诊断性能。计算了诸如敏感性、特异性和受试者工作特征曲线下面积等性能指标。此外,还回顾性组建了一个单独的比较队列,以比较AI系统与放射科医生正确识别良性甲状腺结节的能力。
验证队列包括4572个甲状腺结节(良性:n = 3134,68.5%;恶性:n = 1438,31.5%)。AI正确识别了2719个(良性结节中的86.8%),并将不必要的FNA从68.5%(3134/4572)减少到9.1%(415/4572)。然而,123个恶性结节(占恶性病例的8.6%)被错误地识别为良性,其中大多数为低或中度可疑。在比较队列中,AI成功识别了81.4%(96/118)的良性结节。它的表现优于初级和高级放射科医生,他们分别仅识别出40%和55%。AI模型的曲线下面积(AUC)为0.88(95%CI 0.85 - 0.91),与初级放射科医生(AUC = 0.43,95%CI 0.36 - 0.50;P = 0.002)和高级放射科医生(AUC = 0.63,95%CI 0.55 - 0.70;P = 0.003)相比,显示出更高的AUC。
与放射科医生相比,AI通过识别最初被放射科医生评估为恶性的良性结节,在减少不必要的FNA方面可以更好地充当“守门员”。然而,由于可能会错误识别极少数低侵袭性恶性结节,对所有这些结节仍需进行积极监测。