Xiao Li, Tang Lixuan, Kuang Wenxuan, Yang Yijing, Deng Ying, Lu Jing, Peng Qinghua, Yan Junfeng
School of Chinese Medicine, Hunan University of Chinese Medicine, Changsha, China.
School of Medicine, Hunan University of Chinese Medicine, Changsha, China.
Medicine (Baltimore). 2024 Dec 20;103(51):e40896. doi: 10.1097/MD.0000000000040896.
In order to take full advantage of traditional Chinese medicine (TCM) and western medicine, combined with machine learning technology, to study the risk factors and better risk prediction model of diabetic retinopathy (DR), and provide basis for the screening and treatment of it. Through a retrospective study of DR cases in the real world, the electronic medical records of patients who met screening criteria were collected. Moreover, Recursive Feature Elimination with Cross-Validation (RFECV) was used for feature selection. Then, the prediction model was built based on Gradient Boosting Machine (GBM) and it was compared with 4 other popular machine learning techniques, including Logistic Regression (LR), K-Nearest Neighbors (KNN), Random Forest, and Support Vector Machine (SVM). The models were evaluated with accuracy, precision, recall, F1 score, and area under the curve (AUC) value as indicators. In addition, grid search was used to optimize the model. To explain the results of the model more intuitively, the Shapley Additive exPlanation (SHAP) method was used. A total of 9034 type 2 diabetes mellitus (T2DM) patients meeting the screening criteria were included in this study, including 1118 patients with DR. 19 features were selected using RFECV in the model construction. We constructed 5 commonly used models, including GBM, LR, KNN, Random Forest, and SVM. By comparing model performance, GBM has the highest accuracy (0.85) and AUC value (0.934), which is the best prediction model. We also carried out hyperparameter optimization of grid search for this model, and the model accuracy reached 0.88, and the AUC value increased to 0.958. Through SHAP analysis, it was found that TCM syndrome types, albumin, low density lipoprotein, triglyceride, total protein, glycosylated hemoglobin were closely related to the increased risk of DR. It can be concluded that TCM syndrome type is the risk factor of DR. The GBM classifier based on grid search optimization, with relevant risk factors of TCM and western medicine as variables, can better predict the risk of DR.
为充分利用中医和西医,结合机器学习技术,研究糖尿病视网膜病变(DR)的危险因素及更好的风险预测模型,为其筛查和治疗提供依据。通过对现实世界中DR病例的回顾性研究,收集符合筛查标准患者的电子病历。此外,采用带交叉验证的递归特征消除法(RFECV)进行特征选择。然后,基于梯度提升机(GBM)构建预测模型,并将其与其他4种常用机器学习技术进行比较,包括逻辑回归(LR)、K近邻(KNN)、随机森林和支持向量机(SVM)。以准确率、精确率、召回率、F1分数和曲线下面积(AUC)值为指标对模型进行评估。此外,使用网格搜索对模型进行优化。为更直观地解释模型结果,采用了夏普利值附加解释(SHAP)方法。本研究共纳入9034例符合筛查标准的2型糖尿病(T2DM)患者,其中1118例患有DR。在模型构建中使用RFECV选择了19个特征。我们构建了5种常用模型,包括GBM、LR、KNN、随机森林和SVM。通过比较模型性能,GBM具有最高的准确率(0.85)和AUC值(0.934),是最佳预测模型。我们还对该模型进行了网格搜索的超参数优化,模型准确率达到0.88,AUC值增至0.958。通过SHAP分析发现,中医证型、白蛋白、低密度脂蛋白、甘油三酯、总蛋白、糖化血红蛋白与DR风险增加密切相关。可以得出结论,中医证型是DR的危险因素。基于网格搜索优化的GBM分类器,以中医和西医的相关危险因素为变量,能够更好地预测DR风险。