Yang Cheng, Liu Qingyang, Guo Haike, Zhang Min, Zhang Lixin, Zhang Guanrong, Zeng Jin, Huang Zhongning, Meng Qianli, Cui Ying
Department of Ophthalmology, Guangdong Provincial People's Hospital, Guangdong Eye Institute, Guangdong Academy of Medical Sciences, Guangzhou, China.
Department of Ophthalmology, Dongguan People's Hospital, Dongguan, China.
Front Med (Lausanne). 2021 Dec 9;8:773881. doi: 10.3389/fmed.2021.773881. eCollection 2021.
To development and validation of machine learning-based classifiers based on simple non-ocular metrics for detecting referable diabetic retinopathy (RDR) in a large-scale Chinese population-based survey. The 1,418 patients with diabetes mellitus from 8,952 rural residents screened in the population-based Dongguan Eye Study were used for model development and validation. Eight algorithms [extreme gradient boosting (XGBoost), random forest, naïve Bayes, k-nearest neighbor (KNN), AdaBoost, Light GBM, artificial neural network (ANN), and logistic regression] were used for modeling to detect RDR in individuals with diabetes. The area under the receiver operating characteristic curve (AUC) and their 95% confidential interval (95% CI) were estimated using five-fold cross-validation as well as an 80:20 ratio of training and validation. The 10 most important features in machine learning models were duration of diabetes, HbA1c, systolic blood pressure, triglyceride, body mass index, serum creatine, age, educational level, duration of hypertension, and income level. Based on these top 10 variables, the XGBoost model achieved the best discriminative performance, with an AUC of 0.816 (95%CI: 0.812, 0.820). The AUCs for logistic regression, AdaBoost, naïve Bayes, and Random forest were 0.766 (95%CI: 0.756, 0.776), 0.754 (95%CI: 0.744, 0.764), 0.753 (95%CI: 0.743, 0.763), and 0.705 (95%CI: 0.697, 0.713), respectively. A machine learning-based classifier that used 10 easily obtained non-ocular variables was able to effectively detect RDR patients. The importance scores of the variables provide insight to prevent the occurrence of RDR. Screening RDR with machine learning provides a useful complementary tool for clinical practice in resource-poor areas with limited ophthalmic infrastructure.
在一项基于大规模中国人群的调查中,开发并验证基于简单非眼部指标的机器学习分类器,用于检测可转诊的糖尿病视网膜病变(RDR)。在基于人群的东莞眼病研究中,从8952名农村居民中筛选出的1418例糖尿病患者用于模型开发和验证。使用八种算法[极端梯度提升(XGBoost)、随机森林、朴素贝叶斯、k近邻(KNN)、AdaBoost、Light GBM、人工神经网络(ANN)和逻辑回归]进行建模,以检测糖尿病个体中的RDR。使用五折交叉验证以及80:20的训练与验证比例来估计受试者工作特征曲线(AUC)下的面积及其95%置信区间(95%CI)。机器学习模型中10个最重要的特征是糖尿病病程、糖化血红蛋白、收缩压、甘油三酯、体重指数、血清肌酐、年龄、教育水平、高血压病程和收入水平。基于这10个顶级变量,XGBoost模型实现了最佳的判别性能,AUC为0.816(95%CI:0.812,0.820)。逻辑回归、AdaBoost、朴素贝叶斯和随机森林的AUC分别为0.766(95%CI:0.756,0.776)、0.754(95%CI:0.744,0.764)、0.753(95%CI:0.743,0.763)和0.705(95%CI:0.697,0.713)。基于10个易于获得的非眼部变量的机器学习分类器能够有效检测RDR患者。变量的重要性得分有助于预防RDR的发生。利用机器学习筛查RDR为眼科基础设施有限的资源匮乏地区的临床实践提供了一个有用的补充工具。