Lopes Costa Geraldo Lucas, Tasca Petroski Guido, Machado Luis Guilherme, Eulalio Santos Bruno, de Oliveira Ramos Fernanda, Feuerschuette Neto Leo Max, De Luca Canto Graziela
Federal University of Santa Catarina, Florianópolis, Brazil.
Brazilian Center for Evidence-Based Research, Federal University of Santa Catarina, Florianópolis, Brazil.
Abdom Radiol (NY). 2025 Jul;50(7):3199-3213. doi: 10.1007/s00261-024-04771-1. Epub 2024 Dec 25.
To evaluate the diagnostic ability and methodological quality of ML models in detecting Pancreatic Ductal Adenocarcinoma (PDAC) in Contrast CT images.
Included studies assessed adults diagnosed with PDAC, confirmed by histopathology. Metrics of tests were interpreted by ML algorithms. Studies provided data on sensitivity and specificity. Studies that did not meet the inclusion criteria, segmentation-focused studies, multiple classifiers or non-diagnostic studies were excluded. PubMed, Cochrane Central Register of Controlled Trials, and Embase were searched without restrictions. Risk of bias was assessed using QUADAS-2, methodological quality was evaluated using Radiomics Quality Score (RQS) and a Checklist for AI in Medical Imaging (CLAIM). Bivariate random-effects models were used for meta-analysis of sensitivity and specificity, I values and subgroup analysis used to assess heterogeneity.
Nine studies were included and 12,788 participants were evaluated, of which 3,997 were included in the meta-analysis. AI models based on CT scans showed an accuracy of 88.7% (IC 95%, 87.7%-89.7%), sensitivity of 87.9% (95% CI, 82.9%-91.6%), and specificity of 92.2% (95% CI, 86.8%-95.5%). The average score of six radiomics studies was 17.83 RQS points. Nine ML methods had an average CLAIM score of 30.55 points.
Our study is the first to quantitatively interpret various independent research, offering insights for clinical application. Despite favorable sensitivity and specificity results, the studies were of low quality, limiting definitive conclusions. Further research is necessary to validate these models before widespread adoption.
评估机器学习(ML)模型在对比增强CT图像中检测胰腺导管腺癌(PDAC)的诊断能力和方法学质量。
纳入的研究评估了经组织病理学确诊为PDAC的成年人。测试指标由ML算法进行解读。研究提供了敏感性和特异性数据。不符合纳入标准的研究、聚焦于分割的研究、多个分类器或非诊断性研究均被排除。对PubMed、Cochrane对照试验中心注册库和Embase进行无限制检索。使用QUADAS-2评估偏倚风险,使用放射组学质量评分(RQS)和医学影像人工智能检查表(CLAIM)评估方法学质量。采用双变量随机效应模型对敏感性和特异性进行Meta分析,使用I值和亚组分析评估异质性。
纳入9项研究,共评估12788名参与者,其中3997名纳入Meta分析。基于CT扫描的人工智能模型显示准确率为88.7%(95%CI,87.7%-89.7%),敏感性为87.9%(95%CI,82.9%-91.6%),特异性为92.2%(95%CI,86.8%-95.5%)。6项放射组学研究的平均评分为17.83个RQS点。9种ML方法的平均CLAIM评分为30.55分。
我们的研究首次对各项独立研究进行定量解读,为临床应用提供了见解。尽管敏感性和特异性结果良好,但研究质量较低,限制了得出确定性结论。在广泛应用之前,有必要进行进一步研究以验证这些模型。