Ciapponi Agustín, Ballivian Jamile, Gentile Carolina, Mejia Jhonatan R, Ruiz-Baena Jessica, Bardach Ariel
Instituto de Efectividad Clínica y Sanitaria (IECS), Buenos Aires, Argentina.
Hospital Italiano de Buenos Aires, Servicio de Oftalmología, Buenos Aires, Argentina.
Eye (Lond). 2025 Apr 29. doi: 10.1038/s41433-025-03809-y.
To evaluate the capability of artificial intelligence (AI) in screening for diabetic retinopathy (DR) utilizing digital retinography captured by non-mydriatic (NM) ≥45° cameras, focusing on diagnosis accuracy, effectiveness, and clinical safety.
We performed an overview of systematic reviews (SRs) up to May 2023 in Medline, Embase, CINAHL, and Web of Science. We used AMSTAR-2 tool to assess the reliability of each SR. We reported meta-analysis estimates or ranges of diagnostic performance figures.
Out of 1336 records, ten SRs were selected, most deemed low or critically low quality. Eight primary studies were included in at least five of the ten SRs and 125 in less than five SRs. No SR reported efficacy, effectiveness, or safety outcomes. The sensitivity and specificity for referable DR were 68-100% and 20-100%, respectively, with an AUROC range of 88 to 99%. For detecting DR at any stage, sensitivity was 79-100%, and specificity was 50-100%, with an AUROC range of 93 to 98%.
AI demonstrates strong diagnostic potential for DR screening using NM cameras, with adequate sensitivity but variable specificity. While AI is increasingly integrated into routine practice, this overview highlights significant heterogeneity in AI models and the cameras used. Additionally, our study enlightens the low quality of existing systematic reviews and the significant challenge of integrating the rapidly growing volume of emerging evidence in this field. Policymakers should carefully evaluate AI tools in specific contexts, and future research must generate updated high-quality evidence to optimize their application and improve patient outcomes.
利用非散瞳(NM)≥45°相机拍摄的数字视网膜图像,评估人工智能(AI)筛查糖尿病视网膜病变(DR)的能力,重点关注诊断准确性、有效性和临床安全性。
我们对截至2023年5月在Medline、Embase、CINAHL和Web of Science上发表的系统评价(SR)进行了综述。我们使用AMSTAR-2工具评估每个SR的可靠性。我们报告了荟萃分析估计值或诊断性能数据的范围。
在1336条记录中,选择了10篇SR,大多数被认为质量低或极低。八项主要研究被纳入十篇SR中的至少五篇,125项被纳入少于五篇SR中。没有SR报告疗效、有效性或安全性结果。可转诊DR的敏感性和特异性分别为68-100%和20-100%,曲线下面积(AUROC)范围为88%至99%。对于检测任何阶段的DR,敏感性为79-100%,特异性为50-100%,AUROC范围为93%至98%。
AI在使用NM相机进行DR筛查方面显示出强大的诊断潜力,但特异性存在差异。虽然AI越来越多地融入常规实践,但本综述强调了AI模型和所使用相机的显著异质性。此外,我们的研究揭示了现有系统评价的低质量以及整合该领域迅速增长的新证据的重大挑战。政策制定者应在特定背景下仔细评估AI工具,未来的研究必须生成更新的高质量证据,以优化其应用并改善患者结局。