Suppr超能文献

高血压风险预测模型的开发与验证:一项基于4287407名参与者的横断面研究。

Development and validation of prediction models for hypertension risks: A cross-sectional study based on 4,287,407 participants.

作者信息

Ji Weidong, Zhang Yushan, Cheng Yinlin, Wang Yushan, Zhou Yi

机构信息

Department of Medical Information, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, China.

Department of Maternal and Child Health, School of Public Health, Sun Yat-sen University, Guangzhou, China.

出版信息

Front Cardiovasc Med. 2022 Sep 26;9:928948. doi: 10.3389/fcvm.2022.928948. eCollection 2022.

Abstract

OBJECTIVE

To develop an optimal screening model to identify the individuals with a high risk of hypertension in China by comparing tree-based machine learning models, such as classification and regression tree, random forest, adaboost with a decision tree, extreme gradient boosting decision tree, and other machine learning models like an artificial neural network, naive Bayes, and traditional logistic regression models.

METHODS

A total of 4,287,407 adults participating in the national physical examination were included in the study. Features were selected using the least absolute shrinkage and selection operator regression. The Borderline synthetic minority over-sampling technique was used for data balance. Non-laboratory and semi-laboratory analyses were carried out in combination with the selected features. The tree-based machine learning models, other machine learning models, and traditional logistic regression models were constructed to identify individuals with hypertension, respectively. Top features selected using the best algorithm and the corresponding variable importance score were visualized.

RESULTS

A total of 24 variables were finally included for analyses after the least absolute shrinkage and selection operator regression model. The sample size of hypertensive patients in the training set was expanded from 689,025 to 2,312,160 using the borderline synthetic minority over-sampling technique algorithm. The extreme gradient boosting decision tree algorithm showed the best results (area under the receiver operating characteristic curve of non-laboratory: 0.893 and area under the receiver operating characteristic curve of semi-laboratory: 0.894). This study found that age, systolic blood pressure, waist circumference, diastolic blood pressure, albumin, drinking frequency, electrocardiogram, ethnicity (uyghur, hui, and other), body mass index, sex (female), exercise frequency, diabetes mellitus, and total bilirubin are important factors reflecting hypertension. Besides, some algorithms included in the semi-laboratory analyses showed less improvement in the predictive performance compared to the non-laboratory analyses.

CONCLUSION

Using multiple methods, a more significant prediction model can be built, which discovers risk factors and provides new insights into the prediction and prevention of hypertension.

摘要

目的

通过比较基于树的机器学习模型(如分类与回归树、随机森林、自适应增强决策树、极端梯度提升决策树)以及其他机器学习模型(如人工神经网络、朴素贝叶斯)和传统逻辑回归模型,开发一种优化的筛查模型,以识别中国高血压高风险个体。

方法

本研究纳入了4287407名参加全国体检的成年人。使用最小绝对收缩和选择算子回归选择特征。采用边界合成少数过采样技术进行数据平衡。结合所选特征进行非实验室和半实验室分析。分别构建基于树的机器学习模型、其他机器学习模型和传统逻辑回归模型来识别高血压个体。使用最佳算法选择的顶级特征及其相应的变量重要性得分进行可视化展示。

结果

经过最小绝对收缩和选择算子回归模型后,最终共纳入24个变量进行分析。使用边界合成少数过采样技术算法,训练集中高血压患者的样本量从689025扩大到2312160。极端梯度提升决策树算法显示出最佳结果(非实验室分析的受试者工作特征曲线下面积:0.893,半实验室分析的受试者工作特征曲线下面积:0.894)。本研究发现年龄、收缩压、腰围、舒张压、白蛋白、饮酒频率、心电图、民族(维吾尔族、回族等)、体重指数、性别(女性)、运动频率、糖尿病和总胆红素是反映高血压的重要因素。此外,与非实验室分析相比,半实验室分析中包含的一些算法在预测性能上的提升较小。

结论

使用多种方法可以构建更有意义的预测模型,该模型能够发现风险因素,并为高血压的预测和预防提供新的见解。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c155/9548597/8d3a26afa335/fcvm-09-928948-g0001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验