School of Optical-Electrical and Computer Engineering, University of Shanghai for Science and Technology, Shanghai, China.
Deparement of Anesthesia, Shanghai Ninth People's Hospital, Shanghai JiaoTong University School of Medicine, Shanghai, China.
Eye (Lond). 2022 Jul;36(7):1433-1441. doi: 10.1038/s41433-021-01552-8. Epub 2021 Jul 1.
To present and validate a deep ensemble algorithm to detect diabetic retinopathy (DR) and diabetic macular oedema (DMO) using retinal fundus images.
A total of 8739 retinal fundus images were collected from a retrospective cohort of 3285 patients. For detecting DR and DMO, a multiple improved Inception-v4 ensembling approach was developed. We measured the algorithm's performance and made a comparison with that of human experts on our primary dataset, while its generalization was assessed on the publicly available Messidor-2 dataset. Also, we investigated systematically the impact of the size and number of input images used in training on model's performance, respectively. Further, the time budget of training/inference versus model performance was analyzed.
On our primary test dataset, the model achieved an 0.992 (95% CI, 0.989-0.995) AUC corresponding to 0.925 (95% CI, 0.916-0.936) sensitivity and 0.961 (95% CI, 0.950-0.972) specificity for referable DR, while the sensitivity and specificity for ophthalmologists ranged from 0.845 to 0.936, and from 0.912 to 0.971, respectively. For referable DMO, our model generated an AUC of 0.994 (95% CI, 0.992-0.996) with a 0.930 (95% CI, 0.919-0.941) sensitivity and 0.971 (95% CI, 0.965-0.978) specificity, whereas ophthalmologists obtained sensitivities ranging between 0.852 and 0.946, and specificities ranging between 0.926 and 0.985.
This study showed that the deep ensemble model exhibited excellent performance in detecting DR and DMO, and had good robustness and generalization, which could potentially help support and expand DR/DMO screening programs.
提出并验证一种基于视网膜眼底图像的深度集成算法,用于检测糖尿病视网膜病变(DR)和糖尿病黄斑水肿(DMO)。
从 3285 名患者的回顾性队列中收集了 8739 张视网膜眼底图像。为了检测 DR 和 DMO,我们开发了一种改进的多 Inception-v4 集成方法。我们在我们的主要数据集上测量了算法的性能,并与人类专家的表现进行了比较,同时在公开的 Messidor-2 数据集上评估了其泛化能力。此外,我们系统地研究了训练中使用的输入图像的大小和数量对模型性能的影响。此外,还分析了训练/推断时间预算与模型性能之间的关系。
在我们的主要测试数据集上,该模型的 AUC 为 0.992(95%置信区间,0.989-0.995),对应于可参考 DR 的 0.925(95%置信区间,0.916-0.936)的敏感性和 0.961(95%置信区间,0.950-0.972)的特异性,而眼科医生的敏感性和特异性范围分别为 0.845 至 0.936,以及 0.912 至 0.971。对于可参考的 DMO,我们的模型生成的 AUC 为 0.994(95%置信区间,0.992-0.996),具有 0.930(95%置信区间,0.919-0.941)的敏感性和 0.971(95%置信区间,0.965-0.978)的特异性,而眼科医生的敏感性范围为 0.852 至 0.946,特异性范围为 0.926 至 0.985。
这项研究表明,深度集成模型在检测 DR 和 DMO 方面表现出优异的性能,具有良好的稳健性和泛化能力,这可能有助于支持和扩大 DR/DMO 筛查计划。