Huang Yushu, Yang Xifan, Wang Qi, Abula Adila, Dong Yue, Li Wenyuan
Center of Clinical Big Data and Analytics of The Second Affiliated Hospital, Department of Big Data in Health Science School of Public Health, Zhejiang University School of Medicine, Hangzhou, Zhejiang, China.
Zhejiang Provincial Key Laboratory of Intelligent Preventive Medicine, Hangzhou, Zhejiang, China.
Front Aging Neurosci. 2025 Jul 16;17:1532884. doi: 10.3389/fnagi.2025.1532884. eCollection 2025.
Conventional machine learning (ML) approaches for constructing biological age (BA) have predominantly relied on blood-based markers, limiting their scope. This study aims to develop and validate novel ML-based BA models using a comprehensive set of clinical, behavioral, and socioeconomic factors and evaluate their predictive performance for mortality.
We analyzed data from 24,985 participants in the National Health and Nutrition Examination Survey (NHANES) from 1999 to 2010, with follow-up extending to 31 December 2019, or until death or loss to follow-up. Thirty features, including blood and urine biochemistry, physical examination data, behavioral traits, and socioeconomic factors, were selected using the Least Absolute Shrinkage and Selection Operator (LASSO). These features were utilized to train deep neural networks (DNN) and ensemble learning models, specifically the Deep Biological Age (DBA) and Ensemble Biological Age (EnBA), with chronological age (CA) as the reference label. Model performance was assessed using mean absolute error (MAE), while interpretability was explored using Shapley Additive exPlanation (SHAP). Predictive accuracy of DBA and EnBA for mortality was compared with Phenotypic Age (PhenoAge) using the area under the curve (AUC) derived from Cox proportional hazards models and hazard ratios (HR), adjusted for demographics and lifestyle factors. Sensitivity analyses were performed to ensure robustness.
DBA and EnBA accurately predicted actual age (MAE = 2.98 and 3.58 years, respectively) and demonstrated strong predictive capability for all-cause mortality, with AUCs of 0.896 (95% CI: 0.891-0.898) for DBA and 0.889 (95% CI: 0.884-0.894) for EnBA. Higher DBA and EnBA accelerations were significantly associated with increased mortality risk (HR = 1.059 and 1.039, respectively). SHAP analysis highlighted prescription medication usage, hepatitis B surface antibody status, and vigorous physical activity as the most influential features contributing to DBA predictions. Furthermore, BA acceleration was linked to elevated risk of death from specific chronic conditions, including cardiovascular and cerebrovascular diseases and cancer.
Our study successfully developed and validated two ML-based BA models capable of accurately predicting both all-cause and cause-specific mortality. These findings suggest that the DBA and EnBA models hold promise for early identification of high-risk individuals, potentially facilitating timely preventive interventions and improving population health outcomes.
传统的用于构建生物学年龄(BA)的机器学习(ML)方法主要依赖于血液标志物,限制了其应用范围。本研究旨在使用一套全面的临床、行为和社会经济因素开发并验证基于ML的新型BA模型,并评估其对死亡率的预测性能。
我们分析了1999年至2010年美国国家健康与营养检查调查(NHANES)中24,985名参与者的数据,随访期延长至2019年12月31日,或直至死亡或失访。使用最小绝对收缩和选择算子(LASSO)选择了30个特征,包括血液和尿液生化指标、体格检查数据、行为特征和社会经济因素。这些特征被用于训练深度神经网络(DNN)和集成学习模型,即深度生物学年龄(DBA)和集成生物学年龄(EnBA),以实足年龄(CA)作为参考标签。使用平均绝对误差(MAE)评估模型性能,同时使用Shapley加法解释(SHAP)探索模型的可解释性。使用Cox比例风险模型得出的曲线下面积(AUC)和风险比(HR),并对人口统计学和生活方式因素进行调整,将DBA和EnBA对死亡率的预测准确性与表型年龄(PhenoAge)进行比较。进行敏感性分析以确保结果的稳健性。
DBA和EnBA能够准确预测实际年龄(MAE分别为2.98岁和3.58岁),并对全因死亡率具有很强的预测能力——DBA的AUC为0.896(95%CI:0.891-0.898),EnBA的AUC为0.889(95%CI:0.884-0.894)。较高的DBA和EnBA加速与死亡风险增加显著相关(HR分别为1.059和1.039)。SHAP分析突出了处方药使用、乙肝表面抗体状态和剧烈体育活动是对DBA预测最有影响的特征。此外,BA加速与特定慢性病(包括心血管和脑血管疾病以及癌症)的死亡风险升高有关。
我们的研究成功开发并验证了两个基于ML的BA模型,能够准确预测全因死亡率和特定病因死亡率。这些发现表明,DBA和EnBA模型有望早期识别高危个体,可能有助于及时进行预防性干预并改善人群健康结果。