• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于 KNHANES 的基于机器学习的心血脑管疾病诊断和风险因素分析。

Machine learning-based diagnosis and risk factor analysis of cardiocerebrovascular disease based on KNHANES.

机构信息

Department of Family Medicine, Kyung Hee University Hospital, Seoul, Republic of Korea.

Department of Electrical and Electronic Engineering, Hanyang University, Ansan, Korea.

出版信息

Sci Rep. 2022 Feb 10;12(1):2250. doi: 10.1038/s41598-022-06333-1.

DOI:10.1038/s41598-022-06333-1
PMID:35145205
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8831514/
Abstract

The prevalence of cardiocerebrovascular disease (CVD) is continuously increasing, and it is the leading cause of human death. Since it is difficult for physicians to screen thousands of people, high-accuracy and interpretable methods need to be presented. We developed four machine learning-based CVD classifiers (i.e., multi-layer perceptron, support vector machine, random forest, and light gradient boosting) based on the Korea National Health and Nutrition Examination Survey. We resampled and rebalanced KNHANES data using complex sampling weights such that the rebalanced dataset mimics a uniformly sampled dataset from overall population. For clear risk factor analysis, we removed multicollinearity and CVD-irrelevant variables using VIF-based filtering and the Boruta algorithm. We applied synthetic minority oversampling technique and random undersampling before ML training. We demonstrated that the proposed classifiers achieved excellent performance with AUCs over 0.853. Using Shapley value-based risk factor analysis, we identified that the most significant risk factors of CVD were age, sex, and the prevalence of hypertension. Additionally, we identified that age, hypertension, and BMI were positively correlated with CVD prevalence, while sex (female), alcohol consumption and, monthly income were negative. The results showed that the feature selection and the class balancing technique effectively improve the interpretability of models.

摘要

心脑血管疾病(CVD)的患病率不断上升,是人类死亡的主要原因。由于医生难以对数千人进行筛查,因此需要提出高精度且可解释的方法。我们基于韩国国家健康和营养检查调查(KNHANES)开发了四种基于机器学习的 CVD 分类器(即多层感知机、支持向量机、随机森林和轻梯度提升)。我们使用复杂的抽样权重对 KNHANES 数据进行了重采样和再平衡,以使重新平衡的数据集模拟从总体人群中均匀采样的数据集。为了进行明确的风险因素分析,我们使用基于 VIF 的过滤和 Boruta 算法去除了多重共线性和与 CVD 无关的变量。在 ML 训练之前,我们应用了合成少数过采样技术和随机欠采样。我们证明了所提出的分类器具有出色的性能,AUC 超过 0.853。使用基于 Shapley 值的风险因素分析,我们确定 CVD 的最重要风险因素是年龄、性别和高血压的患病率。此外,我们确定年龄、高血压和 BMI 与 CVD 的患病率呈正相关,而性别(女性)、饮酒和月收入呈负相关。结果表明,特征选择和分类平衡技术有效地提高了模型的可解释性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ae9b/8831514/ee1bc9df257b/41598_2022_6333_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ae9b/8831514/29d9fba9862f/41598_2022_6333_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ae9b/8831514/922231bf87b1/41598_2022_6333_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ae9b/8831514/ee1bc9df257b/41598_2022_6333_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ae9b/8831514/29d9fba9862f/41598_2022_6333_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ae9b/8831514/922231bf87b1/41598_2022_6333_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ae9b/8831514/ee1bc9df257b/41598_2022_6333_Fig3_HTML.jpg

相似文献

1
Machine learning-based diagnosis and risk factor analysis of cardiocerebrovascular disease based on KNHANES.基于 KNHANES 的基于机器学习的心血脑管疾病诊断和风险因素分析。
Sci Rep. 2022 Feb 10;12(1):2250. doi: 10.1038/s41598-022-06333-1.
2
Machine learning algorithms identify hypokalaemia risk in people with hypertension in the United States National Health and Nutrition Examination Survey 1999-2018.机器学习算法在美国国家健康与营养调查 1999-2018 中识别出高血压人群中的低钾血症风险。
Ann Med. 2023 Dec;55(1):2209336. doi: 10.1080/07853890.2023.2209336.
3
Machine learning models to identify low adherence to influenza vaccination among Korean adults with cardiovascular disease.利用机器学习模型识别韩国心血管疾病成年患者流感疫苗低接种率。
BMC Cardiovasc Disord. 2021 Mar 9;21(1):129. doi: 10.1186/s12872-021-01925-7.
4
A data-driven approach to predicting diabetes and cardiovascular disease with machine learning.基于机器学习的数据驱动方法预测糖尿病和心血管疾病。
BMC Med Inform Decis Mak. 2019 Nov 6;19(1):211. doi: 10.1186/s12911-019-0918-5.
5
Building a Cardiovascular Disease Prediction Model for Smartwatch Users Using Machine Learning: Based on the Korea National Health and Nutrition Examination Survey.基于韩国国家健康和营养检查调查的利用机器学习为智能手表用户构建心血管疾病预测模型。
Biosensors (Basel). 2021 Jul 8;11(7):228. doi: 10.3390/bios11070228.
6
Comparison of estimates and time series stability of Korea Community Health Survey and Korea National Health and Nutrition Examination Survey.韩国社区健康调查与韩国国家健康和营养检查调查的估计值和时间序列稳定性比较。
Epidemiol Health. 2019;41:e2019012. doi: 10.4178/epih.e2019012. Epub 2019 Apr 7.
7
Ensemble of heterogeneous classifiers for diagnosis and prediction of coronary artery disease with reduced feature subset.用于冠状动脉疾病诊断和预测的具有简化特征子集的异构分类器集成
Comput Methods Programs Biomed. 2021 Jan;198:105770. doi: 10.1016/j.cmpb.2020.105770. Epub 2020 Sep 30.
8
A machine learning approach to personalized predictors of dyslipidemia: a cohort study.机器学习在血脂异常个体化预测指标中的应用:一项队列研究。
Front Public Health. 2023 Sep 20;11:1213926. doi: 10.3389/fpubh.2023.1213926. eCollection 2023.
9
Enhancing selection of alcohol consumption-associated genes by random forest.随机森林增强酒精消费相关基因的选择。
Br J Nutr. 2024 Jun 28;131(12):2058-2067. doi: 10.1017/S0007114524000795. Epub 2024 Apr 12.
10
Score and Correlation Coefficient-Based Feature Selection for Predicting Heart Failure Diagnosis by Using Machine Learning Algorithms.基于评分和相关系数的特征选择在使用机器学习算法预测心力衰竭诊断中的应用。
Comput Math Methods Med. 2021 Dec 20;2021:8500314. doi: 10.1155/2021/8500314. eCollection 2021.

引用本文的文献

1
Machine Learning for Predicting Postoperative Functional Disability and Mortality Among Older Patients With Cancer: Retrospective Cohort Study.机器学习用于预测老年癌症患者术后功能残疾和死亡率:回顾性队列研究
JMIR Aging. 2025 May 14;8:e65898. doi: 10.2196/65898.
2
Discovering Vitamin-D-Deficiency-Associated Factors in Korean Adults Using KNHANES Data Based on an Integrated Analysis of Machine Learning and Statistical Techniques.基于机器学习和统计技术的综合分析,利用韩国国家健康与营养检查调查(KNHANES)数据发现韩国成年人维生素D缺乏相关因素。
Nutrients. 2025 Feb 8;17(4):618. doi: 10.3390/nu17040618.
3
Characterisation of cardiovascular disease (CVD) incidence and machine learning risk prediction in middle-aged and elderly populations: data from the China health and retirement longitudinal study (CHARLS).

本文引用的文献

1
Machine Learning-Based Cardiovascular Disease Prediction Model: A Cohort Study on the Korean National Health Insurance Service Health Screening Database.基于机器学习的心血管疾病预测模型:对韩国国民健康保险服务健康筛查数据库的队列研究
Diagnostics (Basel). 2021 May 25;11(6):943. doi: 10.3390/diagnostics11060943.
2
Pre-existing and machine learning-based models for cardiovascular risk prediction.基于既有数据和机器学习的心血管风险预测模型。
Sci Rep. 2021 Apr 26;11(1):8886. doi: 10.1038/s41598-021-88257-w.
3
Screening Model for Estimating Undiagnosed Diabetes among People with a Family History of Diabetes Mellitus: A KNHANES-Based Study.
中老年人群心血管疾病(CVD)发病率及机器学习风险预测的特征分析:来自中国健康与养老追踪调查(CHARLS)的数据
BMC Public Health. 2025 Feb 7;25(1):518. doi: 10.1186/s12889-025-21609-7.
4
Interpretability-based machine learning for predicting the risk of death from pulmonary inflammation in Chinese intensive care unit patients.基于可解释性的机器学习用于预测中国重症监护病房患者肺部炎症导致的死亡风险。
Front Med (Lausanne). 2024 Jun 12;11:1399527. doi: 10.3389/fmed.2024.1399527. eCollection 2024.
5
Investigation of factors regarding the effects of COVID-19 pandemic on college students' depression by quantum annealer.利用量子退火器研究 COVID-19 大流行对大学生抑郁影响的相关因素。
Sci Rep. 2024 Feb 26;14(1):4684. doi: 10.1038/s41598-024-54533-8.
6
Prediction of declarative memory profile in panic disorder patients: a machine learning-based approach.惊恐障碍患者陈述性记忆特征预测:基于机器学习的方法。
Braz J Psychiatry. 2023 Nov-Dec;45(6):482-490. doi: 10.47626/1516-4446-2023-3291. Epub 2023 Oct 25.
7
Predicting Cardiovascular Disease Mortality: Leveraging Machine Learning for Comprehensive Assessment of Health and Nutrition Variables.预测心血管疾病死亡率:利用机器学习全面评估健康与营养变量
Nutrients. 2023 Sep 11;15(18):3937. doi: 10.3390/nu15183937.
8
Machine learning-based identification and related features of depression in patients with diabetes mellitus based on the Korea National Health and Nutrition Examination Survey: A cross-sectional study.基于韩国国家健康和营养检查调查的基于机器学习的糖尿病患者抑郁识别及相关特征:一项横断面研究。
PLoS One. 2023 Jul 13;18(7):e0288648. doi: 10.1371/journal.pone.0288648. eCollection 2023.
9
Exploring sex disparities in cardiovascular disease risk factors using principal component analysis and latent class analysis techniques.运用主成分分析和潜在类别分析技术探究心血管疾病危险因素中的性别差异。
BMC Med Inform Decis Mak. 2023 May 25;23(1):101. doi: 10.1186/s12911-023-02179-3.
10
Machine learning-based risk factor analysis of necrotizing enterocolitis in very low birth weight infants.基于机器学习的极低出生体重儿坏死性小肠结肠炎风险因素分析。
Sci Rep. 2022 Dec 10;12(1):21407. doi: 10.1038/s41598-022-25746-6.
基于 KNHANES 的糖尿病家族史人群中未诊断糖尿病的筛查模型研究。
Int J Environ Res Public Health. 2020 Nov 30;17(23):8903. doi: 10.3390/ijerph17238903.
4
Exploring feature selection and classification methods for predicting heart disease.探索用于预测心脏病的特征选择和分类方法。
Digit Health. 2020 Mar 29;6:2055207620914777. doi: 10.1177/2055207620914777. eCollection 2020 Jan-Dec.
5
Machine learning for characterizing risk of type 2 diabetes mellitus in a rural Chinese population: the Henan Rural Cohort Study.基于中国农村人群的机器学习特征分析 2 型糖尿病风险:河南农村队列研究。
Sci Rep. 2020 Mar 10;10(1):4406. doi: 10.1038/s41598-020-61123-x.
6
A data-driven approach to predicting diabetes and cardiovascular disease with machine learning.基于机器学习的数据驱动方法预测糖尿病和心血管疾病。
BMC Med Inform Decis Mak. 2019 Nov 6;19(1):211. doi: 10.1186/s12911-019-0918-5.
7
Cardiovascular disease risk prediction using automated machine learning: A prospective study of 423,604 UK Biobank participants.使用自动化机器学习进行心血管疾病风险预测:对 423604 名英国生物库参与者的前瞻性研究。
PLoS One. 2019 May 15;14(5):e0213653. doi: 10.1371/journal.pone.0213653. eCollection 2019.
8
Global, regional, and national burden of neurological disorders, 1990-2016: a systematic analysis for the Global Burden of Disease Study 2016.全球、区域和国家神经障碍负担,1990-2016 年:2016 年全球疾病负担研究的系统分析。
Lancet Neurol. 2019 May;18(5):459-480. doi: 10.1016/S1474-4422(18)30499-X. Epub 2019 Mar 14.
9
Machine Learning Outperforms ACC / AHA CVD Risk Calculator in MESA.机器学习在 MESA 研究中优于 ACC/AHA CVD 风险计算器。
J Am Heart Assoc. 2018 Nov 20;7(22):e009476. doi: 10.1161/JAHA.118.009476.
10
Global, regional, and national age-sex-specific mortality for 282 causes of death in 195 countries and territories, 1980-2017: a systematic analysis for the Global Burden of Disease Study 2017.全球、区域和国家按年龄、性别和死因分类的死亡率,195 个国家和地区,1980-2017 年:2017 年全球疾病负担研究的系统分析。
Lancet. 2018 Nov 10;392(10159):1736-1788. doi: 10.1016/S0140-6736(18)32203-7. Epub 2018 Nov 8.