Department of Ophthalmology, University of Washington School of Medicine, Seattle, WA
Department of Ophthalmology, Puget Sound Veteran Affairs, Seattle, WA.
Diabetes Care. 2021 May;44(5):1168-1175. doi: 10.2337/dc20-1877. Epub 2021 Jan 5.
With rising global prevalence of diabetic retinopathy (DR), automated DR screening is needed for primary care settings. Two automated artificial intelligence (AI)-based DR screening algorithms have U.S. Food and Drug Administration (FDA) approval. Several others are under consideration while in clinical use in other countries, but their real-world performance has not been evaluated systematically. We compared the performance of seven automated AI-based DR screening algorithms (including one FDA-approved algorithm) against human graders when analyzing real-world retinal imaging data.
This was a multicenter, noninterventional device validation study evaluating a total of 311,604 retinal images from 23,724 veterans who presented for teleretinal DR screening at the Veterans Affairs (VA) Puget Sound Health Care System (HCS) or Atlanta VA HCS from 2006 to 2018. Five companies provided seven algorithms, including one with FDA approval, that independently analyzed all scans, regardless of image quality. The sensitivity/specificity of each algorithm when classifying images as referable DR or not were compared with original VA teleretinal grades and a regraded arbitrated data set. Value per encounter was estimated.
Although high negative predictive values (82.72-93.69%) were observed, sensitivities varied widely (50.98-85.90%). Most algorithms performed no better than humans against the arbitrated data set, but two achieved higher sensitivities, and one yielded comparable sensitivity (80.47%, = 0.441) and specificity (81.28%, = 0.195). Notably, one had lower sensitivity (74.42%) for proliferative DR ( = 9.77 × 10) than the VA teleretinal graders. Value per encounter varied at $15.14-$18.06 for ophthalmologists and $7.74-$9.24 for optometrists.
The DR screening algorithms showed significant performance differences. These results argue for rigorous testing of all such algorithms on real-world data before clinical implementation.
随着全球糖尿病视网膜病变(DR)患病率的上升,初级保健环境需要进行自动 DR 筛查。两种基于人工智能(AI)的自动 DR 筛查算法已获得美国食品和药物管理局(FDA)的批准。其他一些算法正在其他国家临床使用中,但它们的实际性能尚未得到系统评估。我们比较了七种基于人工智能的自动 DR 筛查算法(包括一种获得 FDA 批准的算法)与人类分级员在分析真实世界视网膜成像数据时的性能。
这是一项多中心、非干预性设备验证研究,评估了 2006 年至 2018 年期间,23724 名退伍军人在退伍军人事务部(VA)普吉特湾医疗保健系统(HCS)或亚特兰大 VA HCS 进行远程视网膜 DR 筛查时的 311604 张视网膜图像。五家公司提供了七种算法,包括一种获得 FDA 批准的算法,该算法可独立分析所有扫描图像,而不论图像质量如何。比较了每种算法在将图像分类为可参考 DR 或不可参考 DR 时的灵敏度/特异性,并与原始 VA 远程视网膜分级和经分级仲裁的数据进行比较。估计每次就诊的价值。
虽然观察到高阴性预测值(82.72-93.69%),但灵敏度差异很大(50.98-85.90%)。大多数算法在仲裁数据集上的性能均不如人类,但是有两种算法的灵敏度更高,一种算法的灵敏度相当(80.47%, = 0.441),特异性相当(81.28%, = 0.195)。值得注意的是,有一种算法对增殖性 DR 的敏感性较低(74.42%, = 9.77×10),低于 VA 远程视网膜分级员。每次就诊的价值在眼科医生为 15.14-18.06 美元之间,视光师为 7.74-9.24 美元之间。
DR 筛查算法显示出显著的性能差异。这些结果表明,在临床实施之前,需要对所有此类算法在真实世界数据上进行严格测试。