From the Department of Ultrasound, Ruijin Hospital, Shanghai Jiaotong University School of Medicine, 197 Ruijin Er Road, 200025, Shanghai, China (W.W.X., X.H.J., Z.H.M., W.W.Z., T.L., H.T.Z., Y.J.D., J.Q.Z.); Department of Scientific Research, Shanghai Aitrox Technology Corporation Limited, Shanghai, China (X.L.G., Y.L., C.C.F., K.Y.Z., Q.F., C.H.); Department of Ultrasound, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, China (R.F.Z.); Department of Medical Ultrasound, Affiliated Hospital of Guizhou Medical University, Guiyang, China (Y.G., X.C.); Department of Medical Ultrasound, Yunnan Cancer Hospital & The Third Affiliated Hospital of Kunming Medical University, Kunming, China (X.M.L.); Department of Ultrasound, Yunnan Kungang Hospital, The Seventh Affiliated Hospital of Dali University, Anning, China (N.L.); Department of Ultrasound, Affiliated Hospital of Yan'an University, Yan'an, China (B.Y.B.); Department of Ultrasound, Tangdu Hospital, Fourth Military Medical University, Xi'an, China (Q.Y.L.); Department of Ultrasound, Shanxi Provincial People's Hospital, Taiyuan, China (J.P.Y.); Department of Ultrasound, Traditional Chinese Medical Hospital of Xinjiang Uygur Autonomous Region, Urumqi, Xinjiang Uygur Autonomous Region, China (H.Z.); Department of Ultrasound, Gansu Provincial Cancer Hospital, Lanzhou, China (L.G.); Department of Ultrasound, Jilin Central General Hospital, Jilin, China (B.G.); and College of Health Science and Technology, Shanghai Jiaotong University School of Medicine, Shanghai, China (J.Q.Z.).
Radiology. 2023 Jun;307(5):e221157. doi: 10.1148/radiol.221157.
Background Artificial intelligence (AI) models have improved US assessment of thyroid nodules; however, the lack of generalizability limits the application of these models. Purpose To develop AI models for segmentation and classification of thyroid nodules in US using diverse data sets from nationwide hospitals and multiple vendors, and to measure the impact of the AI models on diagnostic performance. Materials and Methods This retrospective study included consecutive patients with pathologically confirmed thyroid nodules who underwent US using equipment from 12 vendors at 208 hospitals across China from November 2017 to January 2019. The detection, segmentation, and classification models were developed based on the subset or complete set of images. Model performance was evaluated by precision and recall, Dice coefficient, and area under the receiver operating characteristic curve (AUC) analyses. Three scenarios (diagnosis without AI assistance, with freestyle AI assistance, and with rule-based AI assistance) were compared with three senior and three junior radiologists to optimize incorporation of AI into clinical practice. Results A total of 10 023 patients (median age, 46 years [IQR 37-55 years]; 7669 female) were included. The detection, segmentation, and classification models had an average precision, Dice coefficient, and AUC of 0.98 (95% CI: 0.96, 0.99), 0.86 (95% CI: 0.86, 0.87), and 0.90 (95% CI: 0.88, 0.92), respectively. The segmentation model trained on the nationwide data and classification model trained on the mixed vendor data exhibited the best performance, with a Dice coefficient of 0.91 (95% CI: 0.90, 0.91) and AUC of 0.98 (95% CI: 0.97, 1.00), respectively. The AI model outperformed all senior and junior radiologists ( < .05 for all comparisons), and the diagnostic accuracies of all radiologists were improved ( < .05 for all comparisons) with rule-based AI assistance. Conclusion Thyroid US AI models developed from diverse data sets had high diagnostic performance among the Chinese population. Rule-based AI assistance improved the performance of radiologists in thyroid cancer diagnosis. © RSNA, 2023
背景 人工智能 (AI) 模型已经提高了美国对甲状腺结节的评估能力;然而,这些模型的泛化能力有限,限制了它们的应用。
目的 利用来自全国多家医院和多个供应商的不同数据集,开发用于甲状腺结节超声分割和分类的 AI 模型,并测量 AI 模型对诊断性能的影响。
材料与方法 本回顾性研究纳入了 2017 年 11 月至 2019 年 1 月期间在中国 208 家医院使用 12 个供应商的设备进行超声检查、经病理证实为甲状腺结节的连续患者。基于子集或完整图像集开发检测、分割和分类模型。通过精确率和召回率、Dice 系数和受试者工作特征曲线下面积 (AUC) 分析评估模型性能。将三种情况(无 AI 辅助诊断、自由风格 AI 辅助诊断和基于规则的 AI 辅助诊断)与三名高级和三名初级放射科医生进行比较,以优化 AI 纳入临床实践的方式。
结果 共纳入 10023 名患者(中位年龄 46 岁 [IQR,37-55 岁];7669 名女性)。检测、分割和分类模型的平均准确率、Dice 系数和 AUC 分别为 0.98(95%CI:0.96,0.99)、0.86(95%CI:0.86,0.87)和 0.90(95%CI:0.88,0.92)。基于全国性数据训练的分割模型和基于混合供应商数据训练的分类模型表现最佳,Dice 系数分别为 0.91(95%CI:0.90,0.91)和 AUC 为 0.98(95%CI:0.97,1.00)。AI 模型优于所有高级和初级放射科医生(所有比较均<.05),并且规则型 AI 辅助可提高所有放射科医生的诊断准确性(所有比较均<.05)。
结论 在中国人群中,基于不同数据集开发的甲状腺超声 AI 模型具有较高的诊断性能。基于规则的 AI 辅助提高了放射科医生诊断甲状腺癌的性能。