Department of Computer Science, Norwegian University of Science and Technology, Trondheim, Norway.
Department of Circulation and Medical Imaging, Norwegian University of Science and Technology, Trondheim, Norway.
Sci Rep. 2024 Mar 7;14(1):5609. doi: 10.1038/s41598-024-56170-7.
In this study, we aimed to create an 11-year hypertension risk prediction model using data from the Trøndelag Health (HUNT) Study in Norway, involving 17 852 individuals (20-85 years; 38% male; 24% incidence rate) with blood pressure (BP) below the hypertension threshold at baseline (1995-1997). We assessed 18 clinical, behavioral, and socioeconomic features, employing machine learning models such as eXtreme Gradient Boosting (XGBoost), Elastic regression, K-Nearest Neighbor, Support Vector Machines (SVM) and Random Forest. For comparison, we used logistic regression and a decision rule as reference models and validated six external models, with focus on the Framingham risk model. The top-performing models consistently included XGBoost, Elastic regression and SVM. These models efficiently identified hypertension risk, even among individuals with optimal baseline BP (< 120/80 mmHg), although improvement over reference models was modest. The recalibrated Framingham risk model outperformed the reference models, approaching the best-performing ML models. Important features included age, systolic and diastolic BP, body mass index, height, and family history of hypertension. In conclusion, our study demonstrated that linear effects sufficed for a well-performing model. The best models efficiently predicted hypertension risk, even among those with optimal or normal baseline BP, using few features. The recalibrated Framingham risk model proved effective in our cohort.
在这项研究中,我们旨在使用来自挪威特隆赫姆健康(HUNT)研究的数据创建一个 11 年高血压风险预测模型,涉及 17852 名个体(20-85 岁;38%为男性;24%的发病率),他们在基线(1995-1997 年)时血压低于高血压阈值。我们评估了 18 个临床、行为和社会经济特征,使用机器学习模型,如极端梯度提升(XGBoost)、弹性回归、K-最近邻、支持向量机(SVM)和随机森林。为了比较,我们使用逻辑回归和决策规则作为参考模型,并验证了六个外部模型,重点是弗雷明汉风险模型。表现最好的模型始终包括 XGBoost、弹性回归和 SVM。这些模型即使在基线血压最佳的个体中(<120/80mmHg),也能有效地识别高血压风险,尽管与参考模型相比略有改善。经重新校准的弗雷明汉风险模型优于参考模型,接近表现最佳的 ML 模型。重要的特征包括年龄、收缩压和舒张压、体重指数、身高和高血压家族史。总之,我们的研究表明,线性效应足以满足表现良好的模型。最好的模型使用少数特征就能有效地预测高血压风险,即使是在基线血压最佳或正常的个体中。经重新校准的弗雷明汉风险模型在我们的队列中证明是有效的。