Huang Xiaoqin, Sun Jian, Gupta Krati, Montesano Giovanni, Crabb David P, Garway-Heath David F, Brusini Paolo, Lanzetta Paolo, Oddone Francesco, Turpin Andrew, McKendrick Allison M, Johnson Chris A, Yousefi Siamak
Department of Ophthalmology, University of Tennessee Health Science Center, Memphis, TN, United States.
German Center for Neurodegenerative Diseases (DZNE), Tübingen, Germany.
Front Med (Lausanne). 2022 Sep 29;9:923096. doi: 10.3389/fmed.2022.923096. eCollection 2022.
To assess the accuracy of probabilistic deep learning models to discriminate normal eyes and eyes with glaucoma from fundus photographs and visual fields.
Algorithm development for discriminating normal and glaucoma eyes using data from multicenter, cross-sectional, case-control study.
Fundus photograph and visual field data from 1,655 eyes of 929 normal and glaucoma subjects to develop and test deep learning models and an independent group of 196 eyes of 98 normal and glaucoma patients to validate deep learning models.
Accuracy and area under the receiver-operating characteristic curve (AUC).
Fundus photographs and OCT images were carefully examined by clinicians to identify glaucomatous optic neuropathy (GON). When GON was detected by the reader, the finding was further evaluated by another clinician. Three probabilistic deep convolutional neural network (CNN) models were developed using 1,655 fundus photographs, 1,655 visual fields, and 1,655 pairs of fundus photographs and visual fields collected from Compass instruments. Deep learning models were trained and tested using 80% of fundus photographs and visual fields for training set and 20% of the data for testing set. Models were further validated using an independent validation dataset. The performance of the probabilistic deep learning model was compared with that of the corresponding deterministic CNN model.
The AUC of the deep learning model in detecting glaucoma from fundus photographs, visual fields, and combined modalities using development dataset were 0.90 (95% confidence interval: 0.89-0.92), 0.89 (0.88-0.91), and 0.94 (0.92-0.96), respectively. The AUC of the deep learning model in detecting glaucoma from fundus photographs, visual fields, and both modalities using the independent validation dataset were 0.94 (0.92-0.95), 0.98 (0.98-0.99), and 0.98 (0.98-0.99), respectively. The AUC of the deep learning model in detecting glaucoma from fundus photographs, visual fields, and both modalities using an early glaucoma subset were 0.90 (0.88,0.91), 0.74 (0.73,0.75), 0.91 (0.89,0.93), respectively. Eyes that were misclassified had significantly higher uncertainty in likelihood of diagnosis compared to eyes that were classified correctly. The uncertainty level of the correctly classified eyes is much lower in the combined model compared to the model based on visual fields only. The AUCs of the deterministic CNN model using fundus images, visual field, and combined modalities based on the development dataset were 0.87 (0.85,0.90), 0.88 (0.84,0.91), and 0.91 (0.89,0.94), and the AUCs based on the independent validation dataset were 0.91 (0.89,0.93), 0.97 (0.95,0.99), and 0.97 (0.96,0.99), respectively, while the AUCs based on an early glaucoma subset were 0.88 (0.86,0.91), 0.75 (0.73,0.77), and 0.92 (0.89,0.95), respectively.
Probabilistic deep learning models can detect glaucoma from multi-modal data with high accuracy. Our findings suggest that models based on combined visual field and fundus photograph modalities detects glaucoma with higher accuracy. While probabilistic and deterministic CNN models provided similar performance, probabilistic models generate certainty level of the outcome thus providing another level of confidence in decision making.
评估概率深度学习模型从眼底照片和视野中鉴别正常眼睛和青光眼眼睛的准确性。
使用多中心、横断面、病例对照研究的数据开发鉴别正常和青光眼眼睛的算法。
来自929名正常和青光眼受试者的1655只眼睛的眼底照片和视野数据,用于开发和测试深度学习模型;以及来自98名正常和青光眼患者的196只眼睛的独立组,用于验证深度学习模型。
准确性和受试者操作特征曲线下面积(AUC)。
临床医生仔细检查眼底照片和OCT图像以识别青光眼性视神经病变(GON)。当读者检测到GON时,另一位临床医生会进一步评估该发现。使用从Compass仪器收集的1655张眼底照片、1655个视野以及1655对眼底照片和视野开发了三个概率深度卷积神经网络(CNN)模型。深度学习模型使用80%的眼底照片和视野作为训练集进行训练和测试,20%的数据作为测试集。使用独立的验证数据集进一步验证模型。将概率深度学习模型的性能与相应的确定性CNN模型的性能进行比较。
使用开发数据集,深度学习模型从眼底照片、视野以及联合模式中检测青光眼的AUC分别为0.90(95%置信区间:0.89 - 0.92)、0.89(0.88 - 0.91)和0.94(0.92 - 0.96)。使用独立验证数据集,深度学习模型从眼底照片、视野以及两种模式中检测青光眼的AUC分别为0.94(0.92 - 0.95)、0.98(0.98 - 0.99)和0.98(0.98 - 0.99)。使用早期青光眼亚组,深度学习模型从眼底照片、视野以及两种模式中检测青光眼的AUC分别为0.90(0.88,0.91)、0.74(0.73,0.75)、0.91(0.89,0.93)。与正确分类的眼睛相比,误分类的眼睛在诊断可能性上具有显著更高的不确定性。与仅基于视野的模型相比,联合模型中正确分类眼睛的不确定性水平要低得多。基于开发数据集,使用眼底图像、视野和联合模式的确定性CNN模型的AUC分别为0.87(0.85,0.90)、0.88(0.84,0.91)和0.91(0.89,0.94),基于独立验证数据集的AUC分别为0.91(0.89,0.93)、0.97(0.95,0.99)和0.97(0.96,0.99),而基于早期青光眼亚组的AUC分别为0.88(0.86,0.91)、0.75(0.73,0.77)和0.92(0.89,0.95)。
概率深度学习模型可以从多模态数据中高精度地检测青光眼。我们的研究结果表明,基于视野和眼底照片联合模式的模型检测青光眼的准确性更高。虽然概率和确定性CNN模型提供了相似的性能,但概率模型生成了结果的确定性水平,从而在决策中提供了另一层次的信心。