Institute of Intelligent Machines, Hefei Institutes of Physical Science, Chinese Academy of Sciences, Hefei, China.
Science Island Branch of Graduate School, University of Science and Technology of China, Hefei, China.
Front Public Health. 2021 Sep 24;9:619429. doi: 10.3389/fpubh.2021.619429. eCollection 2021.
Hypertension is a widespread chronic disease. Risk prediction of hypertension is an intervention that contributes to the early prevention and management of hypertension. The implementation of such intervention requires an effective and easy-to-implement hypertension risk prediction model. This study evaluated and compared the performance of four machine learning algorithms on predicting the risk of hypertension based on easy-to-collect risk factors. A dataset of 29,700 samples collected through a physical examination was used for model training and testing. Firstly, we identified easy-to-collect risk factors of hypertension, through univariate logistic regression analysis. Then, based on the selected features, 10-fold cross-validation was utilized to optimize four models, random forest (RF), CatBoost, MLP neural network and logistic regression (LR), to find the best hyper-parameters on the training set. Finally, the performance of models was evaluated by AUC, accuracy, sensitivity and specificity on the test set. The experimental results showed that the RF model outperformed the other three models, and achieved an AUC of 0.92, an accuracy of 0.82, a sensitivity of 0.83 and a specificity of 0.81. In addition, Body Mass Index (BMI), age, family history and waist circumference (WC) are the four primary risk factors of hypertension. These findings reveal that it is feasible to use machine learning algorithms, especially RF, to predict hypertension risk without clinical or genetic data. The technique can provide a non-invasive and economical way for the prevention and management of hypertension in a large population.
高血压是一种广泛存在的慢性疾病。高血压风险预测是一种干预措施,可以促进高血压的早期预防和管理。实施这种干预措施需要一个有效且易于实施的高血压风险预测模型。本研究评估和比较了四种机器学习算法在基于易于收集的风险因素预测高血压风险方面的性能。通过体检收集了 29700 个样本的数据集用于模型训练和测试。首先,我们通过单变量逻辑回归分析确定了高血压的易于收集的风险因素。然后,基于选定的特征,我们使用 10 折交叉验证来优化四个模型,随机森林(RF)、CatBoost、MLP 神经网络和逻辑回归(LR),以在训练集上找到最佳的超参数。最后,我们通过 AUC、准确性、敏感性和特异性在测试集上评估模型的性能。实验结果表明,RF 模型优于其他三个模型,其 AUC 为 0.92,准确性为 0.82,敏感性为 0.83,特异性为 0.81。此外,体重指数(BMI)、年龄、家族史和腰围(WC)是高血压的四个主要风险因素。这些发现表明,使用机器学习算法,特别是 RF,在没有临床或遗传数据的情况下预测高血压风险是可行的。该技术可以为大规模人群的高血压预防和管理提供一种非侵入性和经济的方法。