Fernández Velasco Pablo, Estévez Asensio Lucia, Torres Beatriz, Ortolá Ana, Gómez Hoyos Emilia, Delgado Esther, de Luís Daniel, Díaz Soto Gonzalo
Department of Endocrinology and Nutrition, Hospital Clínico Universitario Valladolid, Valladolid, Spain.
Centro de Investigación de Endocrinología y Nutrición Clínica (CIENC), Facultad de Medicina, Universidad de Valladolid, Valladolid, Spain.
Endocrine. 2025 Jun 5. doi: 10.1007/s12020-025-04287-8.
Thyroid nodules are commonly evaluated using ultrasound-based risk stratification systems, which rely on subjective descriptors. Artificial intelligence (AI) may improve assessment, but its effectiveness in non-subspecialist settings is unclear. This study evaluated the impact of an AI-based decision support system (AI-DSS) on thyroid nodule ultrasound assessments by general endocrinologists (GE) without subspecialty thyroid imaging training.
A prospective cohort study was conducted on 80 patients undergoing thyroid ultrasound in GE outpatient clinics. Thyroid ultrasound was performed based on clinical judgment as part of routine care by GE. Images were retrospectively analyzed using an AI-DSS (Koios DS), independently of clinician assessments. AI-DSS results were compared with initial GE evaluations and, when referred, with expert evaluations at a subspecialized thyroid nodule clinic (TNC). Agreement in ultrasound features, risk classification by the American College of Radiology Thyroid Imaging Reporting and Data System (ACR TI-RADS) and American Thyroid Association guidelines, and referral recommendations was assessed.
AI-DSS differed notably from GE, particularly assessing nodule composition (solid: 80%vs.36%,p < 0.01), echogenicity (hypoechoic:52%vs.16%,p < 0.01), and echogenic foci (microcalcifications:10.7%vs.1.3%,p < 0.05). AI-DSS classification led to a higher referral rate compared to GE (37.3%vs.30.7%, not statistically significant). Agreement between AI-DSS and GE in ACR TI-RADS scoring was moderate (r = 0.337;p < 0.001), but improved when comparing GE to AI-DSS and TNC subspecialist (r = 0.465;p < 0.05 and r = 0.607;p < 0.05, respectively).
In a non-subspecialist setting, non-adjunct AI-DSS use did not significantly improve risk stratification or reduce hypothetical referrals. The system tended to overestimate risk, potentially leading to unnecessary procedures. Further optimization is required for AI to function effectively in low-prevalence environment.
甲状腺结节通常使用基于超声的风险分层系统进行评估,这些系统依赖主观描述。人工智能(AI)可能会改善评估,但在非专科环境中的有效性尚不清楚。本研究评估了基于人工智能的决策支持系统(AI-DSS)对未接受甲状腺专科成像培训的普通内分泌科医生(GE)进行甲状腺结节超声评估的影响。
对80例在GE门诊接受甲状腺超声检查的患者进行了一项前瞻性队列研究。GE在常规护理中根据临床判断进行甲状腺超声检查。图像使用AI-DSS(Koios DS)进行回顾性分析,独立于临床医生的评估。将AI-DSS结果与GE的初始评估进行比较,并在转诊时与甲状腺结节专科诊所(TNC)的专家评估进行比较。评估超声特征、美国放射学会甲状腺影像报告和数据系统(ACR TI-RADS)及美国甲状腺协会指南的风险分类以及转诊建议的一致性。
AI-DSS与GE的评估结果显著不同,特别是在评估结节成分(实性:80%对36%,p<0.01)、回声性(低回声:52%对16%,p<0.01)和回声灶(微钙化:10.7%对1.3%,p<0.05)方面。与GE相比,AI-DSS分类导致更高的转诊率(37.3%对30.7%,无统计学意义)。AI-DSS与GE在ACR TI-RADS评分上的一致性为中等水平(r=0.337;p<0.001),但在将GE与AI-DSS及TNC专科医生的评估进行比较时一致性有所提高(分别为r=0.465;p<0.05和r=0.607;p<0.05)。
在非专科环境中,非辅助使用AI-DSS并未显著改善风险分层或减少假设性转诊。该系统倾向于高估风险,可能导致不必要的检查。人工智能需要进一步优化才能在低发病率环境中有效发挥作用。