Suppr超能文献

在中国中老年人群中使用可解释机器学习预测全因死亡率和过早死亡。

Predicting all-cause mortality and premature death using interpretable machine learning among a middle-aged and elderly Chinese population.

作者信息

Yu Qi, Zhang Lingzhi, Ma Qian, Da Lijuan, Li Jiahui, Li Wenyuan

机构信息

Center of Clinical Big Data and Analytics of The Second Affiliated Hospital and Department of Big Data in Health Science School of Public Health, Zhejiang University School of Medicine, Hangzhou, 310058, Zhejiang, China.

出版信息

Heliyon. 2024 Aug 28;10(17):e36878. doi: 10.1016/j.heliyon.2024.e36878. eCollection 2024 Sep 15.

Abstract

OBJECTIVE

To develop machine learning-based prediction models for all-cause and premature mortality among the middle-aged and elderly population in China.

METHOD

Adults aged 45 years or older at baseline of 2011 from the China Health and Retirement Longitudinal Study (CHARLS) were included. The stacked ensemble model was built utilizing five selected machine learning algorithms. These models underwent training and testing using the CHARLS 2011-2015 cohort (derivation cohort) and subsequently underwent external validation using the CHARLS 2015-2018 cohort (validation cohort). SHapley Additive exPlanations (SHAP) was introduced to quantify the importance of risk factors and explain machine learning algorithms.

RESULT

In derivation cohort, a total of 10,677 subjects were included, 478 died during the follow-up. The stacked ensemble model demonstrated the highest efficacy in terms of its discrimination capability for predicting all-cause mortality and premature death, with an AUC[95 % CI] of 0.826[0.792-0.859] and 0.773[0.725-0.821], respectively. In validation cohort, the corresponding AUC[95 % CI] were 0.803[0.743-0.864] and 0.791[0.719-0.863], respectively. Risk factors including age, sex, self-reported health, activities of daily living, cognitive function, ever smoker, levels of systolic blood pressure, Cystatin C and low density lipoprotein were strong predictors for both all-cause mortality and premature death.

CONCLUSION

Stacked ensemble models performed well in predicting all-cause and premature death in this Chinese cohort. Interpretable techniques can aid in identifying significant risk factors and non-linear relationships between predictors and mortality.

摘要

目的

建立基于机器学习的中国中老年人群全因死亡率和过早死亡率预测模型。

方法

纳入中国健康与养老追踪调查(CHARLS)2011年基线时年龄在45岁及以上的成年人。利用五种选定的机器学习算法构建堆叠集成模型。这些模型使用CHARLS 2011 - 2015队列(推导队列)进行训练和测试,随后使用CHARLS 2015 - 2018队列(验证队列)进行外部验证。引入夏普利值(SHapley Additive exPlanations,SHAP)来量化风险因素的重要性并解释机器学习算法。

结果

在推导队列中,共纳入10677名受试者,随访期间478人死亡。堆叠集成模型在预测全因死亡率和过早死亡的辨别能力方面表现出最高的效能,预测全因死亡率和过早死亡的曲线下面积(AUC)[95%置信区间]分别为0.826[0.792 - 0.859]和0.773[0.725 - 0.821]。在验证队列中,相应的AUC[95%置信区间]分别为0.803[0.743 - 0.864]和0.791[0.719 - 0.863]。年龄、性别、自我报告的健康状况、日常生活活动能力、认知功能、曾经吸烟情况、收缩压水平、胱抑素C和低密度脂蛋白等风险因素是全因死亡率和过早死亡的强预测因素。

结论

堆叠集成模型在中国队列中预测全因死亡率和过早死亡方面表现良好。可解释技术有助于识别重要风险因素以及预测因素与死亡率之间的非线性关系。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/88b2/11399635/b56b7b801d62/gr1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验