Department of Health Statistics, Fourth Military Medical University, Xi'an, China.
School of Health Management, Xi'an Medical University, Xi'an, China.
J Clin Lab Anal. 2020 Sep;34(9):e23421. doi: 10.1002/jcla.23421. Epub 2020 Jul 29.
To establish a prediction model for cardiovascular diseases (CVD) in the general population based on random forests.
A retrospective study involving 498 subjects was conducted in Xi'an Medical University between 2011 and 2018. The random forest algorithm was used to screen out the variables that greatly affected the CVD prediction and to establish a prediction model. The important variables were included in the multifactorial logistic regression analysis. The area under the curve (AUC) was compared between logistic regression model and random forest model.
The random forest model revealed the variables, including the age, body mass index (BMI), fasting blood glucose (FBG), diastolic blood pressure (DBP), triglyceride (TG), systolic blood pressure (SBP), total cholesterol (TC), waist circumference, and high-density lipoprotein-cholesterol (HDL-C), were more significant for CVD prediction; the AUC was 0.802 in CVD prediction. Multifactorial logistic regression analysis indicated that the risk factors for CVD included the age [odds ratio (OR): 1.14, 95% confidence intervals (CI): 1.10-1.17, P < .001], BMI (OR: 1.13, 95% CI: 1.06-1.20, P < .001), TG (OR: 1.11, 95% CI: 1.02-1.22, P = .023), and DBP (OR: 1.04, 95% CI: 1.02-1.06, P = .001); the AUC was 0.843 in CVD prediction. The established logistic regression prediction model was Logit P = Log[P/(1 - P)] = -11.47 + 0.13 × age + 0.12 × BMI + 0.11 × TG + 0.04 × DBP; P = 1/[1 + exp(-Logit P)]. People were prone to develop CVD at the time of P > .51.
A prediction model for CVD is developed in the general population based on random forests, which provides a simple tool for the early prediction of CVD.
基于随机森林建立适用于普通人群的心血管疾病(CVD)预测模型。
回顾性研究纳入了 2011 年至 2018 年在西安医学院进行的 498 例受试者。采用随机森林算法筛选出对 CVD 预测影响较大的变量,并建立预测模型。多因素逻辑回归分析纳入重要变量。比较逻辑回归模型和随机森林模型的曲线下面积(AUC)。
随机森林模型揭示了年龄、体重指数(BMI)、空腹血糖(FBG)、舒张压(DBP)、甘油三酯(TG)、收缩压(SBP)、总胆固醇(TC)、腰围和高密度脂蛋白胆固醇(HDL-C)等变量对 CVD 预测更为显著;CVD 预测的 AUC 为 0.802。多因素逻辑回归分析表明,CVD 的危险因素包括年龄[比值比(OR):1.14,95%置信区间(CI):1.10-1.17,P<0.001]、BMI(OR:1.13,95%CI:1.06-1.20,P<0.001)、TG(OR:1.11,95%CI:1.02-1.22,P=0.023)和 DBP(OR:1.04,95%CI:1.02-1.06,P=0.001);CVD 预测的 AUC 为 0.843。建立的逻辑回归预测模型为 Logit P=Log[P/(1-P)]=-11.47+0.13×年龄+0.12×BMI+0.11×TG+0.04×DBP;P=1/[1+exp(-Logit P)]。当 P>0.51 时,人们更有可能患上 CVD。
基于随机森林建立了适用于普通人群的 CVD 预测模型,为 CVD 的早期预测提供了一种简单的工具。