高血压风险预测模型的开发与验证：一项基于4287407名参与者的横断面研究。

Development and validation of prediction models for hypertension risks: A cross-sectional study based on 4,287,407 participants.

作者信息

Ji Weidong, Zhang Yushan, Cheng Yinlin, Wang Yushan, Zhou Yi

机构信息

Department of Medical Information, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, China.

Department of Maternal and Child Health, School of Public Health, Sun Yat-sen University, Guangzhou, China.

出版信息

Front Cardiovasc Med. 2022 Sep 26;9:928948. doi: 10.3389/fcvm.2022.928948. eCollection 2022.

DOI:10.3389/fcvm.2022.928948

PMID:36225955

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9548597/

Abstract

OBJECTIVE

To develop an optimal screening model to identify the individuals with a high risk of hypertension in China by comparing tree-based machine learning models, such as classification and regression tree, random forest, adaboost with a decision tree, extreme gradient boosting decision tree, and other machine learning models like an artificial neural network, naive Bayes, and traditional logistic regression models.

METHODS

A total of 4,287,407 adults participating in the national physical examination were included in the study. Features were selected using the least absolute shrinkage and selection operator regression. The Borderline synthetic minority over-sampling technique was used for data balance. Non-laboratory and semi-laboratory analyses were carried out in combination with the selected features. The tree-based machine learning models, other machine learning models, and traditional logistic regression models were constructed to identify individuals with hypertension, respectively. Top features selected using the best algorithm and the corresponding variable importance score were visualized.

RESULTS

A total of 24 variables were finally included for analyses after the least absolute shrinkage and selection operator regression model. The sample size of hypertensive patients in the training set was expanded from 689,025 to 2,312,160 using the borderline synthetic minority over-sampling technique algorithm. The extreme gradient boosting decision tree algorithm showed the best results (area under the receiver operating characteristic curve of non-laboratory: 0.893 and area under the receiver operating characteristic curve of semi-laboratory: 0.894). This study found that age, systolic blood pressure, waist circumference, diastolic blood pressure, albumin, drinking frequency, electrocardiogram, ethnicity (uyghur, hui, and other), body mass index, sex (female), exercise frequency, diabetes mellitus, and total bilirubin are important factors reflecting hypertension. Besides, some algorithms included in the semi-laboratory analyses showed less improvement in the predictive performance compared to the non-laboratory analyses.

CONCLUSION

Using multiple methods, a more significant prediction model can be built, which discovers risk factors and provides new insights into the prediction and prevention of hypertension.

摘要

目的

通过比较基于树的机器学习模型（如分类与回归树、随机森林、自适应增强决策树、极端梯度提升决策树）以及其他机器学习模型（如人工神经网络、朴素贝叶斯）和传统逻辑回归模型，开发一种优化的筛查模型，以识别中国高血压高风险个体。

方法

本研究纳入了4287407名参加全国体检的成年人。使用最小绝对收缩和选择算子回归选择特征。采用边界合成少数过采样技术进行数据平衡。结合所选特征进行非实验室和半实验室分析。分别构建基于树的机器学习模型、其他机器学习模型和传统逻辑回归模型来识别高血压个体。使用最佳算法选择的顶级特征及其相应的变量重要性得分进行可视化展示。

结果

经过最小绝对收缩和选择算子回归模型后，最终共纳入24个变量进行分析。使用边界合成少数过采样技术算法，训练集中高血压患者的样本量从689025扩大到2312160。极端梯度提升决策树算法显示出最佳结果（非实验室分析的受试者工作特征曲线下面积：0.893，半实验室分析的受试者工作特征曲线下面积：0.894）。本研究发现年龄、收缩压、腰围、舒张压、白蛋白、饮酒频率、心电图、民族（维吾尔族、回族等）、体重指数、性别（女性）、运动频率、糖尿病和总胆红素是反映高血压的重要因素。此外，与非实验室分析相比，半实验室分析中包含的一些算法在预测性能上的提升较小。

结论

使用多种方法可以构建更有意义的预测模型，该模型能够发现风险因素，并为高血压的预测和预防提供新的见解。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c155/9548597/8d3a26afa335/fcvm-09-928948-g0001.jpg

相似文献

Development and validation of prediction models for hypertension risks: A cross-sectional study based on 4,287,407 participants.

Front Cardiovasc Med. 2022 Sep 26;9:928948. doi: 10.3389/fcvm.2022.928948. eCollection 2022.

Development and validation of a machine learning-based framework for assessing metabolic-associated fatty liver disease risk.

BMC Public Health. 2024 Sep 18;24(1):2545. doi: 10.1186/s12889-024-19882-z.

Can Predictive Modeling Tools Identify Patients at High Risk of Prolonged Opioid Use After ACL Reconstruction?

Clin Orthop Relat Res. 2020 Jul;478(7):0-1618. doi: 10.1097/CORR.0000000000001251.

Incorporation of a machine learning pathological diagnosis algorithm into the thyroid ultrasound imaging data improves the diagnosis risk of malignant thyroid nodules.

Front Oncol. 2022 Dec 8;12:968784. doi: 10.3389/fonc.2022.968784. eCollection 2022.

Identification of risk factors for infection after mitral valve surgery through machine learning approaches.

Front Cardiovasc Med. 2023 Jun 13;10:1050698. doi: 10.3389/fcvm.2023.1050698. eCollection 2023.

Combinatorial Use of Machine Learning and Logistic Regression for Predicting Carotid Plaque Risk Among 5.4 Million Adults With Fatty Liver Disease Receiving Health Check-Ups: Population-Based Cross-Sectional Study.

JMIR Public Health Surveill. 2023 Sep 7;9:e47095. doi: 10.2196/47095.

Prediction of metabolic and pre-metabolic syndromes using machine learning models with anthropometric, lifestyle, and biochemical factors from a middle-aged population in Korea.

BMC Public Health. 2022 Apr 6;22(1):664. doi: 10.1186/s12889-022-13131-x.

Prediction Model of Osteonecrosis of the Femoral Head After Femoral Neck Fracture: Machine Learning-Based Development and Validation Study.

JMIR Med Inform. 2021 Nov 19;9(11):e30079. doi: 10.2196/30079.

Prediction model of obstructive sleep apnea-related hypertension: Machine learning-based development and interpretation study.

Front Cardiovasc Med. 2022 Dec 5;9:1042996. doi: 10.3389/fcvm.2022.1042996. eCollection 2022.

A risk prediction model for type 2 diabetes mellitus complicated with retinopathy based on machine learning and its application in health management.

Front Med (Lausanne). 2023 Apr 27;10:1136653. doi: 10.3389/fmed.2023.1136653. eCollection 2023.

引用本文的文献

Prevalence and determinants of hypertension among adults of reproductive age in Tanzania: analysis of a cross-sectional Demographic and Health Survey.

BMJ Open. 2025 Jun 19;15(6):e094387. doi: 10.1136/bmjopen-2024-094387.

Development and validation of a machine learning-based framework for assessing metabolic-associated fatty liver disease risk.

BMC Public Health. 2024 Sep 18;24(1):2545. doi: 10.1186/s12889-024-19882-z.

Relationship between socioeconomic status and hypertension incidence among adults in southwest China: a population-based cohort study.

BMC Public Health. 2024 May 2;24(1):1211. doi: 10.1186/s12889-024-18686-5.

Development and validation of machine learning-augmented algorithm for insulin sensitivity assessment in the community and primary care settings: a population-based study in China.

Front Endocrinol (Lausanne). 2024 Jan 25;15:1292346. doi: 10.3389/fendo.2024.1292346. eCollection 2024.

Machine learning for predicting diabetes risk in western China adults.

Diabetol Metab Syndr. 2023 Jul 27;15(1):165. doi: 10.1186/s13098-023-01112-y.

Improving the Classification of PCNSL and Brain Metastases by Developing a Machine Learning Model Based on F-FDG PET.

J Pers Med. 2023 Mar 17;13(3):539. doi: 10.3390/jpm13030539.

本文引用的文献

Establishment and verification of a nomogram prediction model of hypertension risk in Xinjiang Kazakhs.

Medicine (Baltimore). 2021 Oct 22;100(42):e27600. doi: 10.1097/MD.0000000000027600.

Identifying the predictive effectiveness of a genetic risk score for incident hypertension using machine learning methods among populations in rural China.

Hypertens Res. 2021 Nov;44(11):1483-1491. doi: 10.1038/s41440-021-00738-7. Epub 2021 Sep 3.

Lifecourse Educational Trajectories and Hypertension in Midlife: An Application of Sequence Analysis.

J Gerontol A Biol Sci Med Sci. 2022 Feb 3;77(2):383-391. doi: 10.1093/gerona/glab249.

Worldwide trends in hypertension prevalence and progress in treatment and control from 1990 to 2019: a pooled analysis of 1201 population-representative studies with 104 million participants.

Lancet. 2021 Sep 11;398(10304):957-980. doi: 10.1016/S0140-6736(21)01330-1. Epub 2021 Aug 24.

Stable Iterative Variable Selection.

Bioinformatics. 2021 Dec 11;37(24):4810-4817. doi: 10.1093/bioinformatics/btab501.

Development and validation of a nomogram to better predict hypertension based on a 10-year retrospective cohort study in China.

Elife. 2021 May 28;10:e66419. doi: 10.7554/eLife.66419.

Development of the prediction model for hypertension in patients with idiopathic inflammatory myopathies.

J Clin Hypertens (Greenwich). 2021 Aug;23(8):1556-1566. doi: 10.1111/jch.14267. Epub 2021 May 11.

A novel predicted model for hypertension based on a large cross-sectional study.

Sci Rep. 2020 Jun 30;10(1):10615. doi: 10.1038/s41598-020-64980-8.

Prevalence, awareness, treatment and control of hypertension in various ethnic groups (Hui, Kazakh, Kyrgyz, Mongolian, Tajik) in Xinjiang, Northwest China.

Blood Press. 2020 Oct;29(5):276-284. doi: 10.1080/08037051.2020.1745055. Epub 2020 Apr 30.

Logistic regression was as good as machine learning for predicting major chronic diseases.

J Clin Epidemiol. 2020 Jun;122:56-69. doi: 10.1016/j.jclinepi.2020.03.002. Epub 2020 Mar 10.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

高血压风险预测模型的开发与验证：一项基于4287407名参与者的横断面研究。

Development and validation of prediction models for hypertension risks: A cross-sectional study based on 4,287,407 participants.

作者信息

机构信息

出版信息

OBJECTIVE

METHODS

RESULTS

CONCLUSION

目的

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献