Bai Amelia, Carty Christopher, Dai Shuan
Department of Ophthalmology, Queensland Children's Hospital, Brisbane, Australia.
Centre for Children's Health Research, Brisbane, Australia.
Saudi J Ophthalmol. 2022 Oct 14;36(3):296-307. doi: 10.4103/sjopt.sjopt_219_21. eCollection 2022 Jul-Sep.
Artificial intelligence (AI) offers considerable promise for retinopathy of prematurity (ROP) screening and diagnosis. The development of deep-learning algorithms to detect the presence of disease may contribute to sufficient screening, early detection, and timely treatment for this preventable blinding disease. This review aimed to systematically examine the literature in AI algorithms in detecting ROP. Specifically, we focused on the performance of deep-learning algorithms through sensitivity, specificity, and area under the receiver operating curve (AUROC) for both the detection and grade of ROP.
We searched Medline OVID, PubMed, Web of Science, and Embase for studies published from January 1, 2012, to September 20, 2021. Studies evaluating the diagnostic performance of deep-learning models based on retinal fundus images with expert ophthalmologists' judgment as reference standard were included. Studies which did not investigate the presence or absence of disease were excluded. Risk of bias was assessed using the QUADAS-2 tool.
Twelve studies out of the 175 studies identified were included. Five studies measured the performance of detecting the presence of ROP and seven studies determined the presence of plus disease. The average AUROC out of 11 studies was 0.98. The average sensitivity and specificity for detecting ROP was 95.72% and 98.15%, respectively, and for detecting plus disease was 91.13% and 95.92%, respectively.
The diagnostic performance of deep-learning algorithms in published studies was high. Few studies presented externally validated results or compared performance to expert human graders. Large scale prospective validation alongside robust study design could improve future studies.
人工智能(AI)在早产儿视网膜病变(ROP)筛查和诊断方面具有巨大潜力。开发用于检测疾病存在的深度学习算法可能有助于对这种可预防的致盲疾病进行充分筛查、早期检测和及时治疗。本综述旨在系统地研究AI算法检测ROP的文献。具体而言,我们通过敏感性、特异性和受试者工作特征曲线下面积(AUROC)来关注深度学习算法在检测ROP及其分级方面的性能。
我们检索了Medline OVID、PubMed、科学网和Embase,查找2012年1月1日至2021年9月20日发表的研究。纳入以眼科专家的判断为参考标准,基于眼底图像评估深度学习模型诊断性能的研究。未调查疾病有无的研究被排除。使用QUADAS - 2工具评估偏倚风险。
在检索到的175项研究中,有12项被纳入。5项研究测量了检测ROP存在的性能,7项研究确定了“plus病”的存在。11项研究的平均AUROC为0.98。检测ROP的平均敏感性和特异性分别为95.72%和98.15%,检测“plus病”的平均敏感性和特异性分别为91.13%和95.92%。
已发表研究中深度学习算法的诊断性能较高。很少有研究展示外部验证结果或将性能与专业人工分级者进行比较。大规模前瞻性验证以及稳健的研究设计可以改进未来的研究。