Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA.
Department of Electrical, Computer and Biomedical Engineering, University of Pavia, Pavia, Italy.
Inform Health Soc Care. 2024 Oct;49(3-4):162-176. doi: 10.1080/17538157.2024.2400247. Epub 2024 Sep 24.
The world's population is aging rapidly, leading to increased public health and economic burdens due to age-related cardiovascular and neurodegenerative diseases. Early risk detection is essential for prevention and to improve the quality of life in elderly individuals. Plus, health risks associated with aging are not directly tied to chronological age, but are also influenced by a combination of environmental exposures. Past research has introduced the concept of "Phenotypic Age," which combines age with biomarkers to estimate an individual's health risk.
This study explores which factors contribute most to the gap between chronological and phenotypic ages. We combined ten machine learning regression techniques applied to the NHANES dataset, containing demographic, laboratory and socioeconomic data from 41,474 patients, to identify the most important features. We then used clustering analysis and a mixed-effects model to stratify by sex, ethnicity, and education.
We identified 28 demographic, biological and environmental factors related to a significant gap between phenotypic and chronological ages. Stratifying for sex, education and ethnicity, we found statistically significant differences in the outcome distributions.
By showing that health risk prevention should consider both biological and sociodemographic factors, we offer a new approach to predict aging rates and potentially improve targeted prevention strategies for age-related conditions.
世界人口老龄化迅速,导致与年龄相关的心血管和神经退行性疾病的公共卫生和经济负担增加。早期风险检测对于预防和提高老年人的生活质量至关重要。此外,与衰老相关的健康风险与实际年龄并非直接相关,还受到环境暴露的综合影响。过去的研究提出了“表型年龄”的概念,它将年龄与生物标志物结合起来,以估计个体的健康风险。
本研究探讨了哪些因素对实际年龄与表型年龄之间的差距影响最大。我们将十种机器学习回归技术应用于 NHANES 数据集,该数据集包含来自 41474 名患者的人口统计学、实验室和社会经济数据,以确定最重要的特征。然后,我们使用聚类分析和混合效应模型按性别、族裔和教育进行分层。
我们确定了 28 个人口统计学、生物学和环境因素与表型年龄和实际年龄之间存在显著差距有关。按性别、教育和族裔分层,我们发现结果分布存在统计学显著差异。
通过表明健康风险预防应同时考虑生物学和社会人口统计学因素,我们提供了一种新的方法来预测衰老速度,并有可能改善针对与年龄相关的疾病的靶向预防策略。