Coyner Aaron S, Swan Ryan, Campbell J Peter, Ostmo Susan, Brown James M, Kalpathy-Cramer Jayashree, Kim Sang Jin, Jonas Karyn E, Chan R V Paul, Chiang Michael F
Department of Medical Informatics and Clinical Epidemiology, Oregon Health and Science University, Portland, Oregon.
Department of Ophthalmology, Casey Eye Institute, Oregon Health and Science University, Portland, Oregon.
Ophthalmol Retina. 2019 May;3(5):444-450. doi: 10.1016/j.oret.2019.01.015. Epub 2019 Jan 31.
Accurate image-based ophthalmic diagnosis relies on fundus image clarity. This has important implications for the quality of ophthalmic diagnoses and for emerging methods such as telemedicine and computer-based image analysis. The purpose of this study was to implement a deep convolutional neural network (CNN) for automated assessment of fundus image quality in retinopathy of prematurity (ROP).
Experimental study.
Retinal fundus images were collected from preterm infants during routine ROP screenings.
Six thousand one hundred thirty-nine retinal fundus images were collected from 9 academic institutions. Each image was graded for quality (acceptable quality [AQ], possibly acceptable quality [PAQ], or not acceptable quality [NAQ]) by 3 independent experts. Quality was defined as the ability to assess an image confidently for the presence of ROP. Of the 6139 images, NAQ, PAQ, and AQ images represented 5.6%, 43.6%, and 50.8% of the image set, respectively. Because of low representation of NAQ images in the data set, images labeled NAQ were grouped into the PAQ category, and a binary CNN classifier was trained using 5-fold cross-validation on 4000 images. A test set of 2109 images was held out for final model evaluation. Additionally, 30 images were ranked from worst to best quality by 6 experts via pairwise comparisons, and the CNN's ability to rank quality, regardless of quality classification, was assessed.
The CNN performance was evaluated using area under the receiver operating characteristic curve (AUC). A Spearman's rank correlation was calculated to evaluate the overall ability of the CNN to rank images from worst to best quality as compared with experts.
The mean AUC for 5-fold cross-validation was 0.958 (standard deviation, 0.005) for the diagnosis of AQ versus PAQ images. The AUC was 0.965 for the test set. The Spearman's rank correlation coefficient on the set of 30 images was 0.90 as compared with the overall expert consensus ranking.
This model accurately assessed retinal fundus image quality in a comparable manner with that of experts. This fully automated model has potential for application in clinical settings, telemedicine, and computer-based image analysis in ROP and for generalizability to other ophthalmic diseases.
基于图像的准确眼科诊断依赖于眼底图像的清晰度。这对眼科诊断的质量以及远程医疗和基于计算机的图像分析等新兴方法具有重要意义。本研究的目的是实施一种深度卷积神经网络(CNN),用于自动评估早产儿视网膜病变(ROP)的眼底图像质量。
实验研究。
在常规ROP筛查期间从早产儿收集视网膜眼底图像。
从9个学术机构收集了6139张视网膜眼底图像。由3名独立专家对每张图像的质量进行分级(可接受质量[AQ]、可能可接受质量[PAQ]或不可接受质量[NAQ])。质量被定义为有信心评估图像中是否存在ROP的能力。在这6139张图像中,NAQ、PAQ和AQ图像分别占图像集的5.6%、43.6%和50.8%。由于数据集中NAQ图像的代表性较低,将标记为NAQ的图像归入PAQ类别,并使用4000张图像进行5折交叉验证训练二元CNN分类器。留出2109张图像的测试集用于最终模型评估。此外,6名专家通过成对比较将30张图像按质量从最差到最佳进行排序,并评估CNN对质量进行排序的能力,无论质量分类如何。
使用受试者操作特征曲线下面积(AUC)评估CNN的性能。计算Spearman等级相关性,以评估CNN与专家相比将图像从最差到最佳质量排序的总体能力。
对于AQ与PAQ图像的诊断,5折交叉验证的平均AUC为0.958(标准差,0.005)。测试集的AUC为0.965。与专家的总体共识排名相比,30张图像集上的Spearman等级相关系数为0.90。
该模型以与专家相当的方式准确评估了视网膜眼底图像质量。这种全自动模型有潜力应用于ROP的临床环境、远程医疗和基于计算机的图像分析,并可推广到其他眼科疾病。