Suppr超能文献

基于时间序列分析因果推断的中国脑卒中预测机器学习算法

Machine learning algorithms to predict stroke in China based on causal inference of time series analysis.

作者信息

Zheng Qizhi, Zhao Ayang, Wang Xinzhu, Bai Yanhong, Wang Zikun, Wang Xiuying, Zeng Xianzhang, Dong Guanghui

机构信息

College of Computer and Control Engineering, Northeast Forestry University, No.26, Hexing Road, Xiangfang District, Harbin, 150040, China.

School of Medicine and Health, Key Laboratory of Micro-systems and Micro-structures Manufacturing (Ministry of Education), Harbin Institute of Technology, Harbin, 150001, China.

出版信息

BMC Neurol. 2025 May 31;25(1):236. doi: 10.1186/s12883-025-04261-x.

Abstract

IMPORTANCE

Identifying and managing high-risk populations for stroke in a targeted manner is a key area of preventive healthcare.

OBJECTIVE

To assess machine learning (ML) models and causal inference of time series analysis for predicting stroke clinically meaningful model.

DESIGN

This is a retrospective cohort study and data is from China Health and Retirement Longitudinal Study (CHARLS) assessed 11,789 adults in China from 2011 to 2018. Data analysis was performed from June 1 to December 1, 2024.

SETTING

CHARLS adopts a multi-stage probability sampling method, covering samples from 28 provinces, and collects data every two years through computer-aided personal interviews (CAPI).

PARTICIPANTS

This study employed a combination of Vector Autoregression (VAR) model and Graph Neural Networks (GNN) to systematically construct dynamic causal inference. Multiple classic classification algorithms were compared, including Random Forest, Logistic Regression, XGBoost, Support Vector Machine (SVM), K-Nearest Neighbor (KNN), Gradient Boosting, and Multi-Layer Perceptron (MLP). The Synthetic Minority Oversampling Technique (SMOTE) algorithm was used to undersample a small number of samples and employed Stratified K-fold Cross Validation.

MAIN OUTCOME(S) AND MEASURE(S): AUC (Area Under the Curve), Accuracy, Precision, Recall, F1 Score, and Matthews Correlation Coefficient (MCC).

RESULTS

This study included a total of 11,789 participants, including 6,334 females (53.73%) and 5,455 males (46.27%), with an average age of 65 years. Introduction of dynamic causal inference features has significantly improved the performance of almost all models. The area under the ROC curve of each model ranged from 0.78 to 0.83, indicating significant difference (P < 0.01). Among all the models, the Gradient Boosting model demonstrated the highest performance and stability. Model explanation and feature importance analysis generated model interpretation that illustrated significant contributors associated with risks of stroke.

CONCLUSIONS AND RELEVANCE

This study proposes a stroke risk prediction method that combines dynamic causal inference with machine learning models, significantly improving prediction accuracy and revealing key health factors that affect stroke. The research results indicate that dynamic causal inference features have important value in predicting stroke risk, especially in capturing the impact of changes in health status over time on stroke risk. By further optimizing the model and introducing more variables, this study provides theoretical basis and practical guidance for future stroke prevention and intervention strategies.

TRIAL REGISTRATION

IRB00001052-11015.1.2.

摘要

重要性

有针对性地识别和管理中风高危人群是预防性医疗保健的关键领域。

目的

评估机器学习(ML)模型和时间序列分析的因果推断,以预测中风的临床有意义模型。

设计

这是一项回顾性队列研究,数据来自中国健康与养老追踪调查(CHARLS),该调查在2011年至2018年期间对中国11789名成年人进行了评估。数据分析于2024年6月1日至12月1日进行。

背景

CHARLS采用多阶段概率抽样方法,覆盖28个省份的样本,并通过计算机辅助个人访谈(CAPI)每两年收集一次数据。

参与者

本研究采用向量自回归(VAR)模型和图神经网络(GNN)相结合的方法,系统地构建动态因果推断。比较了多种经典分类算法,包括随机森林、逻辑回归、XGBoost、支持向量机(SVM)、K近邻(KNN)、梯度提升和多层感知器(MLP)。使用合成少数过采样技术(SMOTE)算法对少量样本进行欠采样,并采用分层K折交叉验证。

主要结局和指标

曲线下面积(AUC)、准确率、精确率、召回率、F1分数和马修斯相关系数(MCC)。

结果

本研究共纳入11789名参与者,其中女性6334名(53.73%),男性5455名(46.27%),平均年龄65岁。引入动态因果推断特征显著提高了几乎所有模型的性能。每个模型的ROC曲线下面积在0.78至0.83之间,差异有统计学意义(P<0.01)。在所有模型中,梯度提升模型表现出最高的性能和稳定性。模型解释和特征重要性分析生成了模型解释,说明了与中风风险相关的重要因素。

结论和相关性

本研究提出了一种将动态因果推断与机器学习模型相结合的中风风险预测方法,显著提高了预测准确性,并揭示了影响中风的关键健康因素。研究结果表明,动态因果推断特征在预测中风风险方面具有重要价值,特别是在捕捉健康状况随时间变化对中风风险的影响方面。通过进一步优化模型并引入更多变量,本研究为未来中风预防和干预策略提供了理论依据和实践指导。

试验注册

IRB00001052 - 11015.1.2。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eb70/12125820/3a19eac43abb/12883_2025_4261_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验