Rambabu Lekaashree, Edmiston Thomas, Smith Brandon G, Kohler Katharina, Kolias Angelos G, Bethlehem Richard A I, Keane Pearse A, Marcus Hani J, Consortium EyeVu, Hutchinson Peter J, Bashford Tom
Department of Medicine, University of Cambridge, Cambridge, United Kingdom.
NIHR Global Health Research Group on Acquired Brain and Spine Injury, University of Cambridge, Cambridge, United Kingdom.
PLOS Digit Health. 2025 Sep 2;4(9):e0000783. doi: 10.1371/journal.pdig.0000783. eCollection 2025 Sep.
Automated detection of papilloedema using artificial intelligence (AI) and retinal images acquired through an ophthalmoscope for triage of patients with potential intracranial pathology could prove to be beneficial, particularly in resource-limited settings where access to neuroimaging may be limited. However, a comprehensive overview of the current literature on this field is lacking. We conducted a systematic review on the use of AI for papilloedema detection by searching four databases: Ovid MEDLINE, Embase, Web of Science, and IEEE Xplore. Included studies were assessed for quality of reporting using the Checklist for AI in Medical Imaging and appraised using a novel 5-domain rubric, 'SMART', for the presence of bias. For a subset of studies, we also assessed the diagnostic test accuracy using the 'Metadta' command on Stata. Nineteen deep learning systems and eight non-deep learning systems were included. The median number of images of normal optic discs used in the training set was 2509 (IQR 580-9156) and in the testing set was 569 (IQR 119-1378). The number of papilloedema images in the training and testing sets was lower with a median of 1292 (IQR 201-2882) in training set and 201 (IQR 57-388) in the testing set. Age and gender were the two most frequently reported demographic data, included by one-third of the studies. Only ten studies performed external validation. The pooled sensitivity and specificity were calculated to be 0.87 [95% CI 0.76-0.93] and 0.90 [95% CI 0.74-0.97], respectively. Though AI model performance values are reported to be high, these results need to be interpreted with caution due highly biased data selection, poor quality of reporting, and limited evidence of reproducibility. Deep learning models show promise in retinal image analysis of papilloedema, however, external validation using large, diverse datasets in a variety of clinical settings is required before it can be considered a tool for triage of intracranial pathologies in resource-limited areas.
利用人工智能(AI)和通过检眼镜获取的视网膜图像自动检测视乳头水肿,以对潜在颅内病变患者进行分诊可能会被证明是有益的,特别是在资源有限的环境中,那里获得神经成像的机会可能有限。然而,目前缺乏关于该领域的全面文献综述。我们通过搜索四个数据库:Ovid MEDLINE、Embase、科学网和IEEE Xplore,对使用AI检测视乳头水肿进行了系统综述。纳入的研究使用《医学成像中的AI清单》评估报告质量,并使用一种新颖的5领域评分标准“SMART”评估偏倚的存在。对于一部分研究,我们还使用Stata上的“Metadta”命令评估诊断测试的准确性。纳入了19个深度学习系统和8个非深度学习系统。训练集中使用的正常视盘图像的中位数为2509(四分位间距580 - 9156),测试集中为569(四分位间距119 - 1378)。训练集和测试集中视乳头水肿图像的数量较少,训练集的中位数为1292(四分位间距201 - 2882),测试集为201(四分位间距57 - 388)。年龄和性别是最常报告的两项人口统计学数据,三分之一的研究纳入了这两项数据。只有十项研究进行了外部验证。合并后的敏感性和特异性分别计算为0.87 [95%置信区间0.76 - 0.93]和0.90 [95%置信区间0.74 - 0.97]。尽管据报道AI模型的性能值很高,但由于数据选择高度偏倚、报告质量差以及可重复性证据有限,这些结果需要谨慎解释。深度学习模型在视乳头水肿的视网膜图像分析中显示出前景,然而,在被视为资源有限地区颅内病变分诊工具之前,需要在各种临床环境中使用大量、多样的数据集进行外部验证。