Ding Chao, Yuan Minjia, Cheng Jiwei, Wen Junkai
Putuo Hospital, Shanghai University of Traditional Chinese Medicine, Shanghai, China.
Aviation Health Department, Spring Airlines Co.,Ltd, Shanghai, China.
Front Physiol. 2025 Mar 24;16:1528910. doi: 10.3389/fphys.2025.1528910. eCollection 2025.
Stroke, a major global health concern, is responsible for high mortality and long-term disabilities. With the aging population and increasing prevalence of risk factors, its incidence is on the rise. Existing risk assessment tools have limitations, and there is a pressing need for more accurate and personalized stroke risk prediction models. Smoking, a significant modifiable risk factor, has not been comprehensively examined in current models regarding different smoking types.
Data were sourced from the 2015-2018 National Health and Nutrition Examination Survey (NHANES) and the 2020-2021 Behavioral Risk Factor Surveillance System (BRFSS). Tobacco use (including combustible cigarettes and e-cigarettes) and stroke history were obtained through questionnaires. Participants were divided into four subgroups: non-smokers, exclusive combustible cigarette users, exclusive e-cigarette users, and dual users. Covariates such as age, sex, race, education, and health conditions were also collected. Multivariate logistic regression was used to analyze the relationship between smoking and stroke. Four machine-learning models (XGBoost, logistic regression, Random Forest, and Gaussian Naive Bayes) were evaluated using the area under the receiver-operating characteristic curve (AUC), and Shapley's additive interpretation method was applied for feature importance ranking and model interpretation.
A total of 273,028 individuals were included in the study. Exclusive combustible cigarette users had an elevated stroke risk (β: 1.36, 95% CI: 1.26-1.47, < 0.0001). Among the four machine-learning models, the XGBoost model showed the best discriminative ability with an AUC of 0.794 (95% CI = 0.787-0.802).
This study reveals a significant association between smoking types and stroke risk. An XGBoost-based stroke prediction model was established, which has the potential to improve the accuracy of stroke risk assessment and contribute to personalized interventions for stroke prevention, thus alleviating the healthcare burden related to stroke.
中风是全球主要的健康问题,导致高死亡率和长期残疾。随着人口老龄化和危险因素患病率的增加,其发病率呈上升趋势。现有的风险评估工具存在局限性,迫切需要更准确和个性化的中风风险预测模型。吸烟是一个重要的可改变风险因素,目前的模型尚未对不同吸烟类型进行全面研究。
数据来源于2015 - 2018年国家健康与营养检查调查(NHANES)和2020 - 2021年行为危险因素监测系统(BRFSS)。通过问卷调查获取烟草使用情况(包括可燃香烟和电子烟)和中风病史。参与者分为四个亚组:非吸烟者、仅使用可燃香烟者、仅使用电子烟者和双重使用者。还收集了年龄、性别、种族、教育程度和健康状况等协变量。采用多因素逻辑回归分析吸烟与中风之间的关系。使用受试者工作特征曲线下面积(AUC)评估四种机器学习模型(XGBoost、逻辑回归、随机森林和高斯朴素贝叶斯),并应用夏普利加法解释方法进行特征重要性排序和模型解释。
该研究共纳入273,028人。仅使用可燃香烟者的中风风险升高(β:1.36,95%CI:1.26 - 1.47,<0.0001)。在四种机器学习模型中,XGBoost模型的判别能力最佳,AUC为0.794(95%CI = 0.787 - 0.802)。
本研究揭示了吸烟类型与中风风险之间的显著关联。建立了基于XGBoost的中风预测模型,该模型有可能提高中风风险评估的准确性,并有助于中风预防的个性化干预,从而减轻与中风相关的医疗负担。