Liu Lianhua, Bi Bo, Gui Mei, Zhang Linli, Ju Feng, Wang Xiaodan, Cao Li
Department of Biostatistics, School of Public Health, Hainan Medical University, Haikou, Hainan, China.
Department of Mathematics, Physics, and Chemistry teaching, Hainan University, Haikou, Hainan, China.
BMJ Open. 2025 Apr 3;15(4):e092463. doi: 10.1136/bmjopen-2024-092463.
Diabetic peripheral neuropathy (DPN) is a common and serious complication of diabetes, which can lead to foot deformity, ulceration, and even amputation. Early identification is crucial, as more than half of DPN patients are asymptomatic in the early stage. This study aimed to develop and validate multiple risk prediction models for DPN in patients with type 2 diabetes mellitus (T2DM) and to apply the Shapley Additive Explanation (SHAP) method to interpret the best-performing model and identify key risk factors for DPN.
A single-centre retrospective cohort study.
The study was conducted at a tertiary teaching hospital in Hainan.
Data were retrospectively collected from the electronic medical records of patients with diabetes admitted between 1 January 2021 and 28 March 2023. After data preprocessing, 73 variables were retained for baseline analysis. Feature selection was performed using univariate analysis combined with recursive feature elimination (RFE). The dataset was split into training and test sets in an 8:2 ratio, with the training set balanced via the Synthetic Minority Over-sampling Technique. Six machine learning algorithms were applied to develop prediction models for DPN. Hyperparameters were optimised using grid search with 10-fold cross-validation. Model performance was assessed using various metrics on the test set, and the SHAP method was used to interpret the best-performing model.
The study included 3343 T2DM inpatients, with a median age of 60 years (IQR 53-69), and 88.6% (2962/3343) had DPN. The RFE method identified 12 key factors for model construction. Among the six models, XGBoost showed the best predictive performance, achieving an area under the curve of 0.960, accuracy of 0.927, precision of 0.969, recall of 0.948, F1-score of 0.958 and a G-mean of 0.850 on the test set. The SHAP analysis highlighted C reactive protein, total bile acids, gamma-glutamyl transpeptidase, age and lipoprotein(a) as the top five predictors of DPN.
The machine learning approach successfully established a DPN risk prediction model with excellent performance. The use of the interpretable SHAP method could enhance the model's clinical applicability.
糖尿病周围神经病变(DPN)是糖尿病常见且严重的并发症,可导致足部畸形、溃疡甚至截肢。早期识别至关重要,因为超过一半的DPN患者在早期无症状。本研究旨在开发并验证2型糖尿病(T2DM)患者DPN的多种风险预测模型,并应用Shapley加性解释(SHAP)方法解释表现最佳的模型并确定DPN的关键风险因素。
单中心回顾性队列研究。
研究在海南一家三级教学医院进行。
回顾性收集2021年1月1日至2023年3月28日期间收治的糖尿病患者的电子病历数据。经过数据预处理后,保留73个变量用于基线分析。采用单因素分析结合递归特征消除(RFE)进行特征选择。数据集按8:2的比例分为训练集和测试集,训练集通过合成少数过采样技术进行平衡。应用六种机器学习算法开发DPN预测模型。使用网格搜索和10折交叉验证对超参数进行优化。在测试集上使用各种指标评估模型性能,并使用SHAP方法解释表现最佳的模型。
该研究纳入3343例T2DM住院患者,中位年龄60岁(四分位间距53 - 69岁),88.6%(2962/3343)患有DPN。RFE方法确定了12个模型构建的关键因素。在六个模型中,XGBoost表现出最佳预测性能,在测试集上曲线下面积为0.960,准确率为0.927,精确率为0.969,召回率为0.948,F1分数为0.958,G均值为0.850。SHAP分析突出显示C反应蛋白、总胆汁酸、γ-谷氨酰转肽酶、年龄和脂蛋白(a)是DPN的前五大预测因素。
机器学习方法成功建立了具有优异性能的DPN风险预测模型。可解释的SHAP方法的使用可提高模型的临床适用性。