基于中国老年健康长寿调查（CLHLS）构建并验证用于死亡预测的堆叠集成模型。

Development and validation of a stacking ensemble model for death prediction in the Chinese Longitudinal Healthy Longevity Survey (CLHLS).

机构信息

Department of Big Data in Health Science School of Public Health, Center of Clinical Big Data and Analytics of The Second Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, Zhejiang, China.

School of Public Health, Zhejiang University School of Medicine, Zhejiang University, Hangzhou, Zhejiang, China.

出版信息

Maturitas. 2024 Apr;182:107919. doi: 10.1016/j.maturitas.2024.107919. Epub 2024 Jan 19.

DOI:10.1016/j.maturitas.2024.107919

PMID:38290423

Abstract

OBJECTIVE

This study aimed to develop and validate a mortality risk prediction model for older people based on the Chinese Longitudinal Healthy Longevity Survey using the stacking ensemble strategy.

MATERIAL AND METHODS

A total of 12,769 participants aged 65 or more at baseline were included. Ensemble machine learning models were applied to develop a mortality prediction model. We selected three base learners, including logistic regression, eXtreme Gradient Boosting, and Categorical + Boosting, and used logistic regression as the meta-learner. The primary outcome was five-year survival. Variable importance was evaluated by the SHapley Additive exPlanations method.

RESULTS

The mean age at baseline was 88, and 57.8 % of participants were women. The CatBoost model performed the best among the three base learners, the area under the receiver operating characteristics curve (AUC) reached 0.8469 (95%CI: 0.8345-0.8593), and the stacking ensemble model further improved the discrimination ability (AUC = 0.8486, 95%CI: 0.8367-0.8612, P = 0.046). Conventional logistic regression had comparable performance (AUC = 0.8470, 95 % CI: 0.8346-0.8595). Older age, higher scores for self-care activities of daily living, being male, higher objective physical performance capacity scores, not undertaking housework, and lower scores on the Mini-Mental State Examination contributed to higher risk.

CONCLUSIONS

We successfully constructed and validated a few death risk prediction models for a Chinese population of older adults. While the stacking ensemble approach had the best prediction performance, the improvement over conventional logistic regression was insubstantial.

摘要

目的

本研究旨在应用堆叠集成策略，基于中国长寿纵向研究，开发和验证一种适用于老年人的死亡率风险预测模型。

材料与方法

共纳入 12769 名基线时年龄在 65 岁及以上的参与者。应用集成机器学习模型开发死亡率预测模型。我们选择了三个基础学习者，包括逻辑回归、极端梯度提升和分类+提升，并使用逻辑回归作为元学习者。主要结局为五年生存率。采用 Shapley 加性解释法评估变量重要性。

结果

基线时的平均年龄为 88 岁，57.8%的参与者为女性。CatBoost 模型在三个基础学习者中表现最佳，受试者工作特征曲线下面积（AUC）达到 0.8469（95%CI：0.8345-0.8593），堆叠集成模型进一步提高了区分能力（AUC=0.8486，95%CI：0.8367-0.8612，P=0.046）。传统逻辑回归模型具有相似的性能（AUC=0.8470，95%CI：0.8346-0.8595）。年龄较大、日常生活自理活动得分较高、男性、客观身体表现能力得分较高、不做家务和简易精神状态检查得分较低与较高的风险相关。