Wang Xu, Wang Weijie, Ren Huiling, Li Xiaoying, Wen Yili
Institute of Medical Information/Medical Library, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, China.
Heliyon. 2024 Apr 10;10(9):e29497. doi: 10.1016/j.heliyon.2024.e29497. eCollection 2024 May 15.
Diabetic retinopathy is one of the major complications of diabetes. In this study, a diabetic retinopathy risk prediction model integrating machine learning models and SHAP was established to increase the accuracy of risk prediction for diabetic retinopathy, explain the rationality of the findings from model prediction and improve the reliability of prediction results.
Data were preprocessed for missing values and outliers, features selected through information gain, a diabetic retinopathy risk prediction model established using the CatBoost and the outputs of the mode interpreted using the SHAP model.
One thousand early warning data of diabetes complications derived from diabetes complication early warning dataset from the National Clinical Medical Sciences Data Center were used in this study. The CatBoost-based model for diabetic retinopathy prediction performed the best in the comparative model test. ALB_CR, HbA, UPR_24, NEPHROPATHY and SCR were positively correlated with diabetic retinopathy, while CP, HB, ALB, DBILI and CRP were negatively correlated with diabetic retinopathy. The relationships between HEIGHT, WEIGHT and ESR characteristics and diabetic retinopathy were not significant.
The risk factors for diabetic retinopathy include poor renal function, elevated blood glucose level, liver disease, hematonosis and dysarteriotony, among others. Diabetic retinopathy can be prevented by monitoring and effectively controlling relevant indices. In this study, the influence relationships between the features were also analyzed to further explore the potential factors of diabetic retinopathy, which can provide new methods and new ideas for the early prevention and clinical diagnosis of subsequent diabetic retinopathy.
糖尿病视网膜病变是糖尿病的主要并发症之一。本研究建立了一种融合机器学习模型和SHAP的糖尿病视网膜病变风险预测模型,以提高糖尿病视网膜病变风险预测的准确性,解释模型预测结果的合理性,提高预测结果的可靠性。
对数据进行缺失值和异常值预处理,通过信息增益选择特征,使用CatBoost建立糖尿病视网膜病变风险预测模型,并使用SHAP模型解释模型输出。
本研究使用了来自国家临床医学科学数据中心糖尿病并发症预警数据集的1000条糖尿病并发症预警数据。在比较模型测试中,基于CatBoost的糖尿病视网膜病变预测模型表现最佳。ALB_CR、HbA、UPR_24、肾病和SCR与糖尿病视网膜病变呈正相关,而CP、HB、ALB、DBILI和CRP与糖尿病视网膜病变呈负相关。HEIGHT、体重和ESR特征与糖尿病视网膜病变之间的关系不显著。
糖尿病视网膜病变的危险因素包括肾功能差、血糖水平升高、肝脏疾病、血液系统疾病和动脉张力异常等。通过监测和有效控制相关指标可预防糖尿病视网膜病变。本研究还分析了特征之间的影响关系,以进一步探索糖尿病视网膜病变的潜在因素,可为后续糖尿病视网膜病变的早期预防和临床诊断提供新方法和新思路。