Bogost Jacob, Linderman Rachel E, Slater Robert, Saunders Thomas F, Pacheco Caleb, Pak Jeong, Voland Rick, Blodi Barbara, Domalpally Amitha
A-Eye Research Unit, Department of Ophthalmology and Visual Sciences, University of Wisconsin, Madison, Wisconsin.
Wisconsin Reading Center, Department of Ophthalmology and Visual Sciences, University of Wisconsin, Madison, Wisconsin.
Ophthalmol Sci. 2025 Apr 7;5(5):100787. doi: 10.1016/j.xops.2025.100787. eCollection 2025 Sep-Oct.
To compare a fully automated artificial intelligence (AI) model, a semiautomated method, and manual planimetry in the longitudinal assessment of geographic atrophy (GA) using fundus autofluorescence images.
A retrospective analysis of 3 GA assessment methods: AI, Heidelberg Eye Explorer semiautomated software (RegionFinder), and manual planimetry.
One hundred eight patients (185 eyes) with GA from a phase IIb clinical trial by GlaxoSmithKline, which evaluated an experimental drug that did not reduce GA enlargement compared with the placebo.
Fundus autofluorescence images of 185 eyes were annotated using manual planimetry, semiautomated RegionFinder, and a fully automated AI model trained and validated on manual planimetry annotations at screening, year 1, and year 2. Artificial intelligence masks were compared with human-guided methods, and regression errors were assessed by stacking masks from consecutive visits. Agreement between methods was assessed using Bland-Altman plots, Dice similarity coefficient (DSC), and comparisons of GA growth rates. Artificial intelligence performance was evaluated based on its need for human edits and frequency of regression errors.
Agreement between methods was evaluated using Bland-Altman plots, DSC, and intraclass correlation coefficients (ICCs). The mean GA growth rate (mm/year) and square root transformation of GA size were compared across methods. Artificial intelligence performance was assessed by the percentage of acceptable masks and the frequency of longitudinal regression errors.
At screening, the mean GA area was 7.22 mm with RegionFinder, 8.37 mm with AI, and 8.66 mm with manual planimetry. RegionFinder measured smaller GA areas than both AI and manual, with a mean difference of -1.45 mm (95% confidence interval [CI]: -1.56, -1.35) versus AI (ICC = 0.945) and -1.87 mm (95% CI: -1.99, -1.75) versus manual (ICC = 0.920). Growth rates were comparable between RegionFinder (1.54 mm/year), AI (1.68 mm/year), and manual (1.80 mm/year) ( = 0.25). Artificial intelligence masks were deemed acceptable in 84.8% of visits, and 81.4% of cases showed no regression over time.
Artificial intelligence accurately measures GA in approximately 85% of cases, requiring human intervention in only 15%, indicating potential to streamline GA measurement in clinical trials while maintaining human oversight.
The author(s) have no proprietary or commercial interest in any materials discussed in this article.
使用眼底自发荧光图像,比较全自动人工智能(AI)模型、半自动方法和手动平面测量法在地理萎缩(GA)纵向评估中的效果。
对三种GA评估方法进行回顾性分析:AI、海德堡眼Explorer半自动软件(RegionFinder)和手动平面测量法。
来自葛兰素史克公司IIb期临床试验的108例GA患者(185只眼),该试验评估的一种实验性药物与安慰剂相比未减少GA扩大。
对185只眼的眼底自发荧光图像使用手动平面测量法、半自动RegionFinder以及在筛查、第1年和第2年基于手动平面测量注释进行训练和验证的全自动AI模型进行标注。将人工智能掩码与人工引导方法进行比较,并通过堆叠连续随访的掩码来评估回归误差。使用布兰德-奥特曼图、骰子相似系数(DSC)和GA增长率比较来评估方法之间的一致性。基于对人工编辑的需求和回归误差频率评估人工智能性能。
使用布兰德-奥特曼图、DSC和组内相关系数(ICC)评估方法之间的一致性。比较各方法的GA平均增长率(mm/年)和GA大小的平方根转换值。通过可接受掩码的百分比和纵向回归误差频率评估人工智能性能。
在筛查时,RegionFinder测得的GA平均面积为7.22mm,AI为8.37mm,手动平面测量法为8.66mm。RegionFinder测得的GA面积小于AI和手动测量法,与AI相比平均差值为-1.45mm(95%置信区间[CI]:-1.56,-1.35)(ICC = 0.945),与手动测量法相比平均差值为-1.87mm(95%CI:-1.99,-1.75)(ICC = 0.920)。RegionFinder(1.54mm/年)、AI(1.68mm/年)和手动测量法(1.80mm/年)的增长率相当(P = 0.25)。在84.8%的随访中人工智能掩码被认为是可接受的,81.4%的病例随时间未出现回归。
人工智能在约85%的病例中能准确测量GA,仅15%需要人工干预,表明在临床试验中简化GA测量同时保持人工监督具有潜力。
作者对本文讨论的任何材料均无专利或商业利益。