Division of Endocrine Surgery, UT Southwestern Medical Center, Dallas, TX 75390, USA.
Biomedical Artificial Intelligence Research Lab, UCLA Department of Bioengineering, Los Angeles, CA 90024, USA.
J Clin Endocrinol Metab. 2024 Jun 17;109(7):1684-1693. doi: 10.1210/clinem/dgae277.
Use of artificial intelligence (AI) to predict clinical outcomes in thyroid nodule diagnostics has grown exponentially over the past decade. The greatest challenge is in understanding the best model to apply to one's own patient population, and how to operationalize such a model in practice.
A literature search of PubMed and IEEE Xplore was conducted for English-language publications between January 1, 2015 and January 1, 2023, studying diagnostic tests on suspected thyroid nodules that used AI. We excluded articles without prospective or external validation, nonprimary literature, duplicates, focused on nonnodular thyroid conditions, not using AI, and those incidentally using AI in support of an experimental diagnostic outside standard clinical practice. Quality was graded by Oxford level of evidence.
A total of 61 studies were identified; all performed external validation, 16 studies were prospective, and 33 compared a model to physician prediction of ground truth. Statistical validation was reported in 50 papers. A diagnostic pipeline was abstracted, yielding 5 high-level outcomes: (1) nodule localization, (2) ultrasound (US) risk score, (3) molecular status, (4) malignancy, and (5) long-term prognosis. Seven prospective studies validated a single commercial AI; strengths included automating nodule feature assessment from US and assisting the physician in predicting malignancy risk, while weaknesses included automated margin prediction and interobserver variability.
Models predominantly used US images to predict malignancy. Of 4 Food and Drug Administration-approved products, only S-Detect was extensively validated. Implementing an AI model locally requires data sanitization and revalidation to ensure appropriate clinical performance.
在过去十年中,人工智能(AI)在甲状腺结节诊断中预测临床结果的应用呈指数级增长。最大的挑战在于了解将最佳模型应用于自身患者人群的方法,以及如何在实践中实现这种模型。
对 2015 年 1 月 1 日至 2023 年 1 月 1 日期间发表的英语文献进行了 PubMed 和 IEEE Xplore 文献检索,研究了使用 AI 对疑似甲状腺结节进行的诊断测试。我们排除了没有前瞻性或外部验证、非主要文献、重复、关注非结节性甲状腺疾病、未使用 AI 以及偶然使用 AI 支持标准临床实践以外的实验性诊断的文章。质量由牛津证据水平分级。
共确定了 61 项研究;所有研究均进行了外部验证,16 项研究为前瞻性研究,33 项研究将模型与医生对真实情况的预测进行了比较。50 篇论文报告了统计验证。提取了一个诊断流程,得出了 5 个高级别结果:(1)结节定位,(2)超声(US)风险评分,(3)分子状态,(4)恶性肿瘤,(5)长期预后。有 7 项前瞻性研究验证了一种单一的商业 AI;其优点包括从 US 自动评估结节特征并协助医生预测恶性肿瘤风险,而缺点包括自动预测边界和观察者间变异性。
模型主要使用 US 图像来预测恶性肿瘤。在 4 种获得美国食品和药物管理局批准的产品中,只有 S-Detect 得到了广泛验证。在本地实施 AI 模型需要数据净化和重新验证,以确保适当的临床性能。