• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于机器学习方法对中国老年人进行中风预测。

Stroke Prediction with Machine Learning Methods among Older Chinese.

机构信息

The State Key Laboratory of Molecular Vaccine and Molecular Diagnostics, School of Public Health, Xiamen University, Xiamen 361102, China.

Key Laboratory of Health Technology Assessment of Fujian Province, School of Public Health, Xiamen University, Xiamen 361102, China.

出版信息

Int J Environ Res Public Health. 2020 Mar 12;17(6):1828. doi: 10.3390/ijerph17061828.

DOI:10.3390/ijerph17061828
PMID:32178250
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7142983/
Abstract

Timely stroke diagnosis and intervention are necessary considering its high prevalence. Previous studies have mainly focused on stroke prediction with balanced data. Thus, this study aimed to develop machine learning models for predicting stroke with imbalanced data in an elderly population in China. Data were obtained from a prospective cohort that included 1131 participants (56 stroke patients and 1075 non-stroke participants) in 2012 and 2014, respectively. Data balancing techniques including random over-sampling (ROS), random under-sampling (RUS), and synthetic minority over-sampling technique (SMOTE) were used to process the imbalanced data in this study. Machine learning methods such as regularized logistic regression (RLR), support vector machine (SVM), and random forest (RF) were used to predict stroke with demographic, lifestyle, and clinical variables. Accuracy, sensitivity, specificity, and areas under the receiver operating characteristic curves (AUCs) were used for performance comparison. The top five variables for stroke prediction were selected for each machine learning method based on the SMOTE-balanced data set. The total prevalence of stroke was high in 2014 (4.95%), with men experiencing much higher prevalence than women (6.76% vs. 3.25%). The three machine learning methods performed poorly in the imbalanced data set with extremely low sensitivity (approximately 0.00) and AUC (approximately 0.50). After using data balancing techniques, the sensitivity and AUC considerably improved with moderate accuracy and specificity, and the maximum values for sensitivity and AUC reached 0.78 (95% CI, 0.73-0.83) for RF and 0.72 (95% CI, 0.71-0.73) for RLR. Using AUCs for RLR, SVM, and RF in the imbalanced data set as references, a significant improvement was observed in the AUCs of all three machine learning methods ( < 0.05) in the balanced data sets. Considering RLR in each data set as a reference, only RF in the imbalanced data set and SVM in the ROS-balanced data set were superior to RLR in terms of AUC. Sex, hypertension, and uric acid were common predictors in all three machine learning methods. Blood glucose level was included in both RLR and RF. Drinking, age and high-sensitivity C-reactive protein level, and low-density lipoprotein cholesterol level were also included in RLR, SVM, and RF, respectively. Our study suggests that machine learning methods with data balancing techniques are effective tools for stroke prediction with imbalanced data.

摘要

鉴于其高发病率,及时进行中风诊断和干预是必要的。以前的研究主要集中在使用平衡数据进行中风预测上。因此,本研究旨在为中国老年人群中使用不平衡数据开发中风预测的机器学习模型。数据来自于一个前瞻性队列,其中包括 2012 年和 2014 年的 1131 名参与者(56 名中风患者和 1075 名非中风患者)。本研究采用随机过采样(ROS)、随机欠采样(RUS)和合成少数过采样技术(SMOTE)等数据平衡技术来处理不平衡数据。使用正则化逻辑回归(RLR)、支持向量机(SVM)和随机森林(RF)等机器学习方法,基于人口统计学、生活方式和临床变量预测中风。使用准确性、敏感性、特异性和接收者操作特征曲线(AUC)下的面积来进行性能比较。根据 SMOTE 平衡数据集,为每种机器学习方法选择了预测中风的前五个最重要变量。2014 年中风总发病率较高(4.95%),男性发病率明显高于女性(6.76% vs. 3.25%)。三种机器学习方法在不平衡数据集中表现不佳,敏感性极低(约为 0.00),AUC 约为 0.50。使用数据平衡技术后,敏感性和 AUC 有了很大的提高,准确性和特异性适中,RF 的最大敏感性和 AUC 达到 0.78(95%CI,0.73-0.83),RLR 的最大敏感性和 AUC 达到 0.72(95%CI,0.71-0.73)。以不平衡数据集中 RLR、SVM 和 RF 的 AUC 为参考,在平衡数据集中,所有三种机器学习方法的 AUC 均有显著提高(<0.05)。以每个数据集的 RLR 为参考,仅在不平衡数据集中的 RF 和 ROS 平衡数据集中的 SVM 在 AUC 方面优于 RLR。性别、高血压和尿酸是所有三种机器学习方法的常见预测因素。血糖水平同时包含在 RLR 和 RF 中。饮酒、年龄和高敏 C 反应蛋白水平以及低密度脂蛋白胆固醇水平也分别包含在 RLR、SVM 和 RF 中。我们的研究表明,使用数据平衡技术的机器学习方法是处理不平衡数据中风预测的有效工具。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6855/7142983/7fcdc2201d60/ijerph-17-01828-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6855/7142983/431e138af6b9/ijerph-17-01828-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6855/7142983/200ab1a5624c/ijerph-17-01828-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6855/7142983/e7a29aaf4258/ijerph-17-01828-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6855/7142983/7fcdc2201d60/ijerph-17-01828-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6855/7142983/431e138af6b9/ijerph-17-01828-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6855/7142983/200ab1a5624c/ijerph-17-01828-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6855/7142983/e7a29aaf4258/ijerph-17-01828-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6855/7142983/7fcdc2201d60/ijerph-17-01828-g004.jpg

相似文献

1
Stroke Prediction with Machine Learning Methods among Older Chinese.基于机器学习方法对中国老年人进行中风预测。
Int J Environ Res Public Health. 2020 Mar 12;17(6):1828. doi: 10.3390/ijerph17061828.
2
Joint modeling strategy for using electronic medical records data to build machine learning models: an example of intracerebral hemorrhage.利用电子病历数据构建机器学习模型的联合建模策略:以脑出血为例。
BMC Med Inform Decis Mak. 2022 Oct 25;22(1):278. doi: 10.1186/s12911-022-02018-x.
3
Novel Insights on Establishing Machine Learning-Based Stroke Prediction Models Among Hypertensive Adults.关于在高血压成年人中建立基于机器学习的中风预测模型的新见解。
Front Cardiovasc Med. 2022 May 6;9:901240. doi: 10.3389/fcvm.2022.901240. eCollection 2022.
4
Comparison of machine learning techniques to predict all-cause mortality using fitness data: the Henry ford exercIse testing (FIT) project.使用健身数据比较机器学习技术预测全因死亡率:亨利福特锻炼测试(FIT)项目。
BMC Med Inform Decis Mak. 2017 Dec 19;17(1):174. doi: 10.1186/s12911-017-0566-6.
5
Machine Learning-Based Model for Predicting Incidence and Severity of Acute Ischemic Stroke in Anterior Circulation Large Vessel Occlusion.基于机器学习的前循环大血管闭塞性急性缺血性卒中发病率和严重程度预测模型
Front Neurol. 2021 Dec 2;12:749599. doi: 10.3389/fneur.2021.749599. eCollection 2021.
6
Machine Learning Models for Predicting Influential Factors of Early Outcomes in Acute Ischemic Stroke: Registry-Based Study.用于预测急性缺血性卒中早期预后影响因素的机器学习模型:基于登记处的研究
JMIR Med Inform. 2022 Mar 25;10(3):e32508. doi: 10.2196/32508.
7
Hospital mortality prediction in traumatic injuries patients: comparing different SMOTE-based machine learning algorithms.创伤性损伤患者的医院死亡率预测:比较不同基于 SMOTE 的机器学习算法。
BMC Med Res Methodol. 2023 Apr 22;23(1):101. doi: 10.1186/s12874-023-01920-w.
8
Application of machine learning algorithms in predicting HIV infection among men who have sex with men: Model development and validation.机器学习算法在预测男男性行为者中 HIV 感染中的应用:模型开发和验证。
Front Public Health. 2022 Aug 25;10:967681. doi: 10.3389/fpubh.2022.967681. eCollection 2022.
9
Prediction of poststroke independent walking using machine learning: a retrospective study.基于机器学习的脑卒中后独立行走预测:一项回顾性研究。
BMC Neurol. 2024 Sep 10;24(1):332. doi: 10.1186/s12883-024-03849-z.
10
Optimizing the Predictive Ability of Machine Learning Methods for Landslide Susceptibility Mapping Using SMOTE for Lishui City in Zhejiang Province, China.利用 SMOTE 优化机器学习方法在浙江省丽水市滑坡易发性制图中的预测能力。
Int J Environ Res Public Health. 2019 Jan 28;16(3):368. doi: 10.3390/ijerph16030368.

引用本文的文献

1
Development and validation of an interpretable risk prediction model for perioperative ischemic stroke in noncardiac, nonvascular, and nonneurosurgical patients: a retrospective study.非心脏、非血管和非神经外科手术患者围手术期缺血性卒中可解释风险预测模型的开发与验证:一项回顾性研究
Front Physiol. 2025 Jul 30;16:1628475. doi: 10.3389/fphys.2025.1628475. eCollection 2025.
2
Enhancing patient rehabilitation outcomes: artificial intelligence-driven predictive modeling for home discharge in neurological and orthopedic conditions.提高患者康复效果:针对神经科和骨科疾病出院居家情况的人工智能驱动预测模型
J Neuroeng Rehabil. 2025 May 26;22(1):117. doi: 10.1186/s12984-025-01654-4.
3

本文引用的文献

1
Machine Learning in Epidemiology and Health Outcomes Research.机器学习在流行病学和健康结果研究中的应用。
Annu Rev Public Health. 2020 Apr 2;41:21-36. doi: 10.1146/annurev-publhealth-040119-094437. Epub 2019 Oct 2.
2
Clustering-based undersampling with random over sampling examples and support vector machine for imbalanced classification of breast cancer diagnosis.基于聚类的欠采样与随机过采样示例和支持向量机在乳腺癌诊断中的不平衡分类。
Comput Assist Surg (Abingdon). 2019 Oct;24(sup2):62-72. doi: 10.1080/24699322.2019.1649074. Epub 2019 Aug 12.
3
Predicting 10-Year and Lifetime Stroke Risk in Chinese Population.
Optimizing Stroke Risk Prediction: A Primary Dataset-Driven Ensemble Classifier With Explainable Artificial Intelligence.
优化中风风险预测:一种基于主要数据集驱动的可解释人工智能集成分类器。
Health Sci Rep. 2025 May 5;8(5):e70799. doi: 10.1002/hsr2.70799. eCollection 2025 May.
4
Prediction of non-suicidal self-injury (NSSI) among rural Chinese junior high school students: a machine learning approach.中国农村初中生非自杀性自伤行为的预测:一种机器学习方法
Ann Gen Psychiatry. 2024 Dec 6;23(1):48. doi: 10.1186/s12991-024-00534-w.
5
Life-Course Multidisciplinary Psychosocial Predictors of Dementia Among Older Adults: Results From the Health and Retirement Study.老年人痴呆症的生命历程多学科社会心理预测因素:健康与退休研究的结果
Innov Aging. 2024 Oct 18;8(11):igae092. doi: 10.1093/geroni/igae092. eCollection 2024.
6
Predicting stroke severity of patients using interpretable machine learning algorithms.使用可解释的机器学习算法预测患者的中风严重程度。
Eur J Med Res. 2024 Nov 14;29(1):547. doi: 10.1186/s40001-024-02147-1.
7
A multi-center big-data approach for precise PICC-RVT prognosis and identification of major risk factors in clinical practice.一种用于临床实践中精准预测经外周静脉穿刺中心静脉置管相关血栓形成(PICC-RVT)及识别主要危险因素的多中心大数据方法。
Heliyon. 2024 Oct 12;10(20):e39178. doi: 10.1016/j.heliyon.2024.e39178. eCollection 2024 Oct 30.
8
Predicting 3-month poor functional outcomes of acute ischemic stroke in young patients using machine learning.使用机器学习预测年轻急性缺血性脑卒中患者 3 个月的不良功能结局。
Eur J Med Res. 2024 Oct 10;29(1):494. doi: 10.1186/s40001-024-02056-3.
9
An intelligent learning system based on electronic health records for unbiased stroke prediction.基于电子健康记录的无偏卒中预测智能学习系统。
Sci Rep. 2024 Oct 4;14(1):23052. doi: 10.1038/s41598-024-73570-x.
10
Unveiling the potential of machine learning approaches in predicting the emergence of stroke at its onset: a predicting framework.揭示机器学习方法在预测中风发病时出现的潜力:一个预测框架。
Sci Rep. 2024 Aug 29;14(1):20053. doi: 10.1038/s41598-024-70354-1.
预测中国人群的 10 年和终生卒中风险。
Stroke. 2019 Sep;50(9):2371-2378. doi: 10.1161/STROKEAHA.119.025553. Epub 2019 Aug 8.
4
Machine Learning for Health Services Researchers.机器学习在卫生服务研究中的应用。
Value Health. 2019 Jul;22(7):808-815. doi: 10.1016/j.jval.2019.02.012.
5
Mortality, morbidity, and risk factors in China and its provinces, 1990-2017: a systematic analysis for the Global Burden of Disease Study 2017.死亡率、发病率和风险因素在中国及其省份,1990-2017 年:2017 年全球疾病负担研究的系统分析。
Lancet. 2019 Sep 28;394(10204):1145-1158. doi: 10.1016/S0140-6736(19)30427-1. Epub 2019 Jun 24.
6
Global, Regional, and Country-Specific Lifetime Risks of Stroke, 1990 and 2016.全球、区域和国家特定人群终生罹患中风的风险,1990 年和 2016 年。
N Engl J Med. 2018 Dec 20;379(25):2429-2437. doi: 10.1056/NEJMoa1804492.
7
Global, regional, and national disability-adjusted life-years (DALYs) for 333 diseases and injuries and healthy life expectancy (HALE) for 195 countries and territories, 1990-2016: a systematic analysis for the Global Burden of Disease Study 2016.全球、地区和国家残疾调整生命年(DALYs)用于 333 种疾病和伤害以及 195 个国家和地区的健康期望寿命(HALE),1990-2016 年:全球疾病负担研究 2016 年的系统分析。
Lancet. 2017 Sep 16;390(10100):1260-1344. doi: 10.1016/S0140-6736(17)32130-X.
8
Global, regional, and national incidence, prevalence, and years lived with disability for 328 diseases and injuries for 195 countries, 1990-2016: a systematic analysis for the Global Burden of Disease Study 2016.全球、区域和国家发病率、患病率以及 195 个国家和地区 1990 年至 2016 年 328 种疾病和伤害导致的残疾年数:2016 年全球疾病负担研究的系统分析。
Lancet. 2017 Sep 16;390(10100):1211-1259. doi: 10.1016/S0140-6736(17)32154-2.
9
Global, regional, and national age-sex specific mortality for 264 causes of death, 1980-2016: a systematic analysis for the Global Burden of Disease Study 2016.全球、地区和国家按年龄、性别划分的 264 种死因的死亡率:2016 年全球疾病负担研究的系统分析。
Lancet. 2017 Sep 16;390(10100):1151-1210. doi: 10.1016/S0140-6736(17)32152-9.
10
Predicting congenital heart defects: A comparison of three data mining methods.预测先天性心脏缺陷:三种数据挖掘方法的比较。
PLoS One. 2017 May 24;12(5):e0177811. doi: 10.1371/journal.pone.0177811. eCollection 2017.