Department of Clinical Medicine, Bengbu Medical University, Bengbu, China.
Department of Oncology Surgery, the Second Affiliated Hospital of Bengbu Medical University, Bengbu, China.
Endocrine. 2024 Aug;85(2):615-625. doi: 10.1007/s12020-024-03735-1. Epub 2024 Feb 23.
To construct a risk prediction model for assisted diagnosis of Diabetic Nephropathy (DN) using machine learning algorithms, and to validate it internally and externally.
Firstly, the data was cleaned and enhanced, and was divided into training and test sets according to the 7:3 ratio. Then, the metrics related to DN were filtered by difference analysis, Least Absolute Shrinkage and Selection Operator (LASSO), Recursive Feature Elimination (RFE), and Max-relevance and Min-redundancy (MRMR) algorithms. Ten machine learning models were constructed based on the key variables. The best model was filtered by Receiver Operating Characteristic (ROC), Precision-Recall (PR), Accuracy, Matthews Correlation Coefficient (MCC), and Kappa, and was internally and externally validated. Based on the best model, an online platform had been constructed.
15 key variables were selected, and among the 10 machine learning models, the Random Forest model achieved the best predictive performance. In the test set, the area under the ROC curve was 0.912, and in two external validation cohorts, the area under the ROC curve was 0.828 and 0.863, indicating excellent predictive and generalization abilities.
The model has a good predictive value and is expected to help in the early diagnosis and screening of clinical DN.
使用机器学习算法构建用于辅助诊断糖尿病肾病 (DN) 的风险预测模型,并进行内部和外部验证。
首先,对数据进行清理和增强,并根据 7:3 的比例将其分为训练集和测试集。然后,通过差异分析、最小绝对值收缩和选择算子 (LASSO)、递归特征消除 (RFE) 和最大相关性和最小冗余度 (MRMR) 算法筛选与 DN 相关的指标。基于关键变量构建了 10 个机器学习模型。通过接收者操作特征 (ROC)、精度-召回率 (PR)、准确性、马修斯相关系数 (MCC) 和 Kappa 筛选最佳模型,并进行内部和外部验证。基于最佳模型,构建了一个在线平台。
筛选出 15 个关键变量,在 10 个机器学习模型中,随机森林模型的预测性能最佳。在测试集中,ROC 曲线下面积为 0.912,在两个外部验证队列中,ROC 曲线下面积分别为 0.828 和 0.863,表明具有良好的预测和泛化能力。
该模型具有良好的预测价值,有望有助于临床 DN 的早期诊断和筛查。