Jassal Karishma, Edwards Melissa, Koohestani Afsaneh, Brown Wendy, Serpell Jonathan W, Lee James C
Monash University Endocrine Surgery Unit, Alfred Hospital, Melbourne, VIC, Australia.
Department of Surgery, Central Clinical School, Monash University, Melbourne, VIC, Australia.
Front Endocrinol (Lausanne). 2025 May 5;16:1506729. doi: 10.3389/fendo.2025.1506729. eCollection 2025.
In recent years, artificial intelligence (AI) tools have become widely studied for thyroid ultrasonography (USG) classification. The real-world applicability of these developed tools as pre-operative diagnostic aids is limited due to model overfitting, clinician trust, and a lack of gold standard surgical histology as ground truth class label. The ongoing dilemma within clinical thyroidology is surgical decision making for indeterminate thyroid nodules (ITN). Genomic sequencing classifiers (GSC) have been utilised for this purpose; however, costs and availability preclude universal adoption creating an inequity gap. We conducted this review to analyse the current evidence of AI in ITN diagnosis without the use of GSC.
English language articles evaluating the diagnostic accuracy of AI for ITNs were identified. A systematic search of PubMed, Google Scholar, and Scopus from inception to 18 February 2025 was performed using comprehensive search strategies incorporating MeSH headings and keywords relating to AI, indeterminate thyroid nodules, and pre-operative diagnosis. This systematic review and meta-analysis was conducted in accordance with methods recommended by the Cochrane Collaboration (PROSPERO ID CRD42023438011).
The search strategy yielded 134 records after the removal of duplicates. A total of 20 models were presented in the seven studies included, five of which were radiological driven, one utilised natural language processing, and one focused on cytology. The pooled meta-analysis incorporated 16 area under the curve (AUC) results derived from 15 models across three studies yielding a combined estimate of 0.82 (95% CI: 0.81-0.84) indicating moderate-to-good classification performance across machine learning (ML) and deep learning (DL) architectures. However, substantial heterogeneity was observed, particularly among DL models (I² = 99.7%, pooled AUC = 0.85, 95% CI: 0.85-0.86). Minimal heterogeneity was observed among ML models (I² = 0.7%), with a pooled AUC of 0.75 (95% CI: 0.70-0.81). Meta-regression analysis performed suggests potential publication bias or systematic differences in model architectures, dataset composition, and validation methodologies.
This review demonstrated the burgeoning potential of AI to be of clinical value in surgical decision making for ITNs; however, study-developed models were unsuitable for clinical implementation based on performance alone at their current states or lacked robust independent external validation. There is substantial capacity for further development in this field.
https://www.crd.york.ac.uk/PROSPERO/, identifier CRD42023438011.
近年来,人工智能(AI)工具在甲状腺超声检查(USG)分类方面得到了广泛研究。由于模型过度拟合、临床医生的信任问题以及缺乏作为基本事实类别标签的金标准手术组织学,这些已开发工具作为术前诊断辅助手段在现实世界中的适用性受到限制。临床甲状腺学中目前面临的困境是对甲状腺结节(ITN)进行手术决策。基因组测序分类器(GSC)已被用于此目的;然而,成本和可用性阻碍了其广泛采用,从而造成了不公平差距。我们进行这项综述是为了分析在不使用GSC的情况下AI在ITN诊断中的现有证据。
识别评估AI对ITN诊断准确性的英文文章。使用结合了与AI、甲状腺结节和术前诊断相关的医学主题词(MeSH)标题和关键词的综合搜索策略,对PubMed、谷歌学术和Scopus从创刊到2025年2月18日进行了系统搜索。本系统综述和荟萃分析是按照Cochrane协作网推荐的方法进行的(PROSPERO编号CRD42023438011)。
去除重复记录后,搜索策略产生了134条记录。纳入的七项研究共展示了20个模型,其中五个是放射学驱动的,一个利用了自然语言处理,一个专注于细胞学。汇总的荟萃分析纳入了来自三项研究中15个模型的16个曲线下面积(AUC)结果,得出综合估计值为0.82(95%置信区间:0.81 - 0.84),表明在机器学习(ML)和深度学习(DL)架构中具有中等至良好的分类性能。然而,观察到存在显著异质性,特别是在DL模型之间(I² = 99.7%,汇总AUC = 0.85,95%置信区间:0.85 - 0.