Raumviboonsuk Paisan, Krause Jonathan, Chotcomwongse Peranut, Sayres Rory, Raman Rajiv, Widner Kasumi, Campana Bilson J L, Phene Sonia, Hemarat Kornwipa, Tadarati Mongkol, Silpa-Archa Sukhum, Limwattanayingyong Jirawut, Rao Chetan, Kuruvilla Oscar, Jung Jesse, Tan Jeffrey, Orprayoon Surapong, Kangwanwongpaisan Chawawat, Sukumalpaiboon Ramase, Luengchaichawang Chainarong, Fuangkaew Jitumporn, Kongsap Pipat, Chualinpha Lamyong, Saree Sarawuth, Kawinpanitan Srirut, Mitvongsa Korntip, Lawanasakol Siriporn, Thepchatri Chaiyasit, Wongpichedchai Lalita, Corrado Greg S, Peng Lily, Webster Dale R
1Department of Ophthalmology, Rajavithi Hospital, Bangkok, Thailand.
2Google AI, Google, Mountain View, CA USA.
NPJ Digit Med. 2019 Apr 10;2:25. doi: 10.1038/s41746-019-0099-8. eCollection 2019.
Deep learning algorithms have been used to detect diabetic retinopathy (DR) with specialist-level accuracy. This study aims to validate one such algorithm on a large-scale clinical population, and compare the algorithm performance with that of human graders. A total of 25,326 gradable retinal images of patients with diabetes from the community-based, nationwide screening program of DR in Thailand were analyzed for DR severity and referable diabetic macular edema (DME). Grades adjudicated by a panel of international retinal specialists served as the reference standard. Relative to human graders, for detecting referable DR (moderate NPDR or worse), the deep learning algorithm had significantly higher sensitivity (0.97 vs. 0.74, < 0.001), and a slightly lower specificity (0.96 vs. 0.98, < 0.001). Higher sensitivity of the algorithm was also observed for each of the categories of severe or worse NPDR, PDR, and DME ( < 0.001 for all comparisons). The quadratic-weighted kappa for determination of DR severity levels by the algorithm and human graders was 0.85 and 0.78 respectively ( < 0.001 for the difference). Across different severity levels of DR for determining referable disease, deep learning significantly reduced the false negative rate (by 23%) at the cost of slightly higher false positive rates (2%). Deep learning algorithms may serve as a valuable tool for DR screening.
深度学习算法已被用于检测糖尿病视网膜病变(DR),其准确性达到了专家水平。本研究旨在在大规模临床人群中验证一种此类算法,并将该算法的性能与人工分级者的性能进行比较。对来自泰国全国社区DR筛查项目的25326张可分级糖尿病患者视网膜图像进行了分析,以确定DR严重程度和可转诊的糖尿病性黄斑水肿(DME)。由一组国际视网膜专家判定的分级用作参考标准。相对于人工分级者,对于检测可转诊的DR(中度非增殖性糖尿病视网膜病变或更严重),深度学习算法具有显著更高的灵敏度(0.97对0.74,<0.001),以及略低的特异性(0.96对0.98,<0.001)。对于严重或更严重的非增殖性糖尿病视网膜病变、增殖性糖尿病视网膜病变和糖尿病性黄斑水肿的每一类,该算法也观察到了更高的灵敏度(所有比较均<0.001)。算法和人工分级者用于确定DR严重程度水平的二次加权kappa分别为0.85和0.78(差异<0.001)。在确定可转诊疾病的不同DR严重程度水平上,深度学习显著降低了假阴性率(降低了23%),代价是假阳性率略有升高(2%)。深度学习算法可能是DR筛查的一种有价值的工具。