Suppr超能文献

用于中风风险预测的机器学习方法:吹田研究的结果

Machine Learning Approaches for Stroke Risk Prediction: Findings from the Suita Study.

作者信息

Vu Thien, Kokubo Yoshihiro, Inoue Mai, Yamamoto Masaki, Mohsen Attayeb, Martin-Morales Agustin, Inoué Takao, Dawadi Research, Araki Michihiro

机构信息

Artificial Intelligence Center for Health and Biomedical Research, National Institutes of Biomedical Innovation, Health and Nutrition, 3-17 Senrioka-Shinmachi, Settsu 566-0002, Japan.

National Cerebral and Cardiovascular Center, 6-1 Kishibe-Shinmachi, Suita 564-8565, Japan.

出版信息

J Cardiovasc Dev Dis. 2024 Jul 1;11(7):207. doi: 10.3390/jcdd11070207.

Abstract

Stroke constitutes a significant public health concern due to its impact on mortality and morbidity. This study investigates the utility of machine learning algorithms in predicting stroke and identifying key risk factors using data from the Suita study, comprising 7389 participants and 53 variables. Initially, unsupervised k-prototype clustering categorized participants into risk clusters, while five supervised models including Logistic Regression (LR), Random Forest (RF), Support Vector Machine (SVM), Extreme Gradient Boosting (XGBoost), and Light Gradient Boosted Machine (LightGBM) were employed to predict stroke outcomes. Stroke incidence disparities among identified risk clusters using the unsupervised k-prototype clustering method are substantial, according to the findings. Supervised learning, particularly RF, was a preferable option because of the higher levels of performance metrics. The Shapley Additive Explanations (SHAP) method identified age, systolic blood pressure, hypertension, estimated glomerular filtration rate, metabolic syndrome, and blood glucose level as key predictors of stroke, aligning with findings from the unsupervised clustering approach in high-risk groups. Additionally, previously unidentified risk factors such as elbow joint thickness, fructosamine, hemoglobin, and calcium level demonstrate potential for stroke prediction. In conclusion, machine learning facilitated accurate stroke risk predictions and highlighted potential biomarkers, offering a data-driven framework for risk assessment and biomarker discovery.

摘要

由于中风对死亡率和发病率的影响,它成为一个重大的公共卫生问题。本研究利用来自吹田研究的数据(包括7389名参与者和53个变量),调查机器学习算法在预测中风和识别关键风险因素方面的效用。最初,无监督的k-原型聚类将参与者分类为风险集群,同时使用包括逻辑回归(LR)、随机森林(RF)、支持向量机(SVM)、极端梯度提升(XGBoost)和轻量级梯度提升机(LightGBM)在内的五个监督模型来预测中风结果。研究结果显示,使用无监督的k-原型聚类方法确定的风险集群之间的中风发病率差异很大。由于性能指标水平较高,监督学习,尤其是随机森林,是一个更可取的选择。夏普利值附加解释(SHAP)方法确定年龄、收缩压、高血压、估计肾小球滤过率、代谢综合征和血糖水平是中风的关键预测因素,这与高危组中无监督聚类方法的结果一致。此外,诸如肘关节厚度、果糖胺、血红蛋白和钙水平等先前未识别的风险因素显示出中风预测的潜力。总之,机器学习有助于准确预测中风风险并突出潜在的生物标志物,为风险评估和生物标志物发现提供了一个数据驱动的框架。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9f34/11276746/ff925664cb2a/jcdd-11-00207-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验