Li Zhixi, Guo Xinxing, Zhang Jian, Liu Xing, Chang Robert, He Mingguang
State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-Sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangdong Provincial Clinical Research Center for Ocular Diseases, Guangzhou, China.
Wilmer Eye Institute, Johns Hopkins University, Baltimore, MD, United States.
Front Med (Lausanne). 2023 Mar 1;10:1115032. doi: 10.3389/fmed.2023.1115032. eCollection 2023.
The aim of this study was to prospectively quantify the level of agreement among the deep learning system, non-physician graders, and general ophthalmologists with different levels of clinical experience in detecting referable diabetic retinopathy, age-related macular degeneration, and glaucomatous optic neuropathy.
Deep learning systems for diabetic retinopathy, age-related macular degeneration, and glaucomatous optic neuropathy classification, with accuracy proven through internal and external validation, were established using 210,473 fundus photographs. Five trained non-physician graders and 47 general ophthalmologists from China were chosen randomly and included in the analysis. A test set of 300 fundus photographs were randomly identified from an independent dataset of 42,388 gradable images. The grading outcomes of five retinal and five glaucoma specialists were used as the reference standard that was considered achieved when ≥50% of gradings were consistent among the included specialists. The area under receiver operator characteristic curve of different groups in relation to the reference standard was used to compare agreement for referable diabetic retinopathy, age-related macular degeneration, and glaucomatous optic neuropathy.
The test set included 45 images (15.0%) with referable diabetic retinopathy, 46 (15.3%) with age-related macular degeneration, 46 (15.3%) with glaucomatous optic neuropathy, and 163 (55.4%) without these diseases. The area under receiver operator characteristic curve for non-physician graders, ophthalmologists with 3-5 years of clinical practice, ophthalmologists with 5-10 years of clinical practice, ophthalmologists with >10 years of clinical practice, and the deep learning system for referable diabetic retinopathy were 0.984, 0.964, 0.965, 0.954, and 0.990 ( = 0.415), respectively. The results for referable age-related macular degeneration were 0.912, 0.933, 0.946, 0.958, and 0.945, respectively, ( = 0.145), and 0.675, 0.862, 0.894, 0.976, and 0.994 for referable glaucomatous optic neuropathy, respectively ( < 0.001).
The findings of this study suggest that the accuracy of this deep learning system is comparable to that of trained non-physician graders and general ophthalmologists for referable diabetic retinopathy and age-related macular degeneration, but the deep learning system performance is better than that of trained non-physician graders for the detection of referable glaucomatous optic neuropathy.
本研究旨在前瞻性地量化深度学习系统、非医师分级人员以及具有不同临床经验水平的普通眼科医生在检测可转诊的糖尿病性视网膜病变、年龄相关性黄斑变性和青光眼性视神经病变方面的一致性水平。
利用210473张眼底照片建立了用于糖尿病性视网膜病变、年龄相关性黄斑变性和青光眼性视神经病变分类的深度学习系统,其准确性已通过内部和外部验证。随机选择了5名经过培训的非医师分级人员和47名来自中国的普通眼科医生纳入分析。从42388张可分级图像的独立数据集中随机确定了300张眼底照片的测试集。5名视网膜和5名青光眼专家的分级结果用作参考标准,当纳入的专家中≥50%的分级一致时,则认为达到该标准。不同组相对于参考标准的受试者操作特征曲线下面积用于比较可转诊的糖尿病性视网膜病变、年龄相关性黄斑变性和青光眼性视神经病变的一致性。
测试集包括45张(15.0%)可转诊的糖尿病性视网膜病变图像、46张(15.3%)年龄相关性黄斑变性图像、46张(15.3%)青光眼性视神经病变图像以及163张(55.4%)无这些疾病的图像。非医师分级人员、具有3 - 5年临床实践经验的眼科医生、具有5 - 10年临床实践经验的眼科医生、具有超过10年临床实践经验的眼科医生以及用于可转诊糖尿病性视网膜病变的深度学习系统的受试者操作特征曲线下面积分别为0.984、0.964、0.965、0.954和0.990(P = 0.415)。可转诊年龄相关性黄斑变性的结果分别为0.912、0.933、0.946、0.958和0.945(P = 0.145),可转诊青光眼性视神经病变的结果分别为0.675、0.862、0.894、0.976和0.994(P < 0.001)。
本研究结果表明,对于可转诊的糖尿病性视网膜病变和年龄相关性黄斑变性,该深度学习系统的准确性与经过培训的非医师分级人员和普通眼科医生相当,但在检测可转诊的青光眼性视神经病变方面,深度学习系统的性能优于经过培训的非医师分级人员。