Cao Xingqi, Yang Guanglai, Jin Xurui, He Liu, Li Xueqin, Zheng Zhoutao, Liu Zuyun, Wu Chenkai
Department of Big Data in Health Science, School of Public Health and Center for Clinical Big Data and Analytics, Second Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China.
Global Health Research Center, Duke Kunshan University, Kunshan, China.
Front Med (Lausanne). 2021 Dec 1;8:698851. doi: 10.3389/fmed.2021.698851. eCollection 2021.
Biological age (BA) has been accepted as a more accurate proxy of aging than chronological age (CA). This study aimed to use machine learning (ML) algorithms to estimate BA in the Chinese population. We used data from 9,771 middle-aged and older Chinese adults (≥45 years) in the 2011/2012 wave of the China Health and Retirement Longitudinal Study and followed until 2018. We used several ML algorithms (e.g., Gradient Boosting Regressor, Random Forest, CatBoost Regressor, and Support Vector Machine) to develop new measures of biological aging (ML-BAs) based on physiological biomarkers. R-squared value and mean absolute error (MAE) were used to determine the optimal performance of these ML-BAs. We used logistic regression models to examine the associations of the best ML-BA and a conventional aging measure-Klemera and Doubal method-BA (KDM-BA) we previously developed-with physical disability and mortality, respectively. The Gradient Boosting Regression model performed the best, resulting in an ML-BA with an R-squared value of 0.270 and an MAE of 6.519. This ML-BA was significantly associated with disability in basic activities of daily living, instrumental activities of daily living, lower extremity mobility, and upper extremity mobility, and mortality, with odds ratios ranging from 1 to 7% (per 1-year increment in ML-BA, all < 0.001), independent of CA. These associations were generally comparable to that of KDM-BA. This study provides a valid ML-based measure of biological aging for middle-aged and older Chinese adults. These findings support the application of ML in geroscience research and may help facilitate preventive and geroprotector intervention studies.
生物年龄(BA)已被公认为比实际年龄(CA)更准确的衰老指标。本研究旨在使用机器学习(ML)算法来估计中国人群的生物年龄。我们使用了来自中国健康与养老追踪调查2011/2012年波次中9771名45岁及以上的中国中老年成年人的数据,并追踪至2018年。我们使用了几种机器学习算法(例如梯度提升回归器、随机森林、CatBoost回归器和支持向量机),基于生理生物标志物开发新的生物衰老测量方法(ML-BA)。使用决定系数(R平方值)和平均绝对误差(MAE)来确定这些ML-BA的最佳性能。我们使用逻辑回归模型分别检验最佳ML-BA和我们之前开发的传统衰老测量方法——克莱梅拉和杜巴尔方法生物年龄(KDM-BA)与身体残疾和死亡率之间的关联。梯度提升回归模型表现最佳,得到的ML-BA的R平方值为0.270,MAE为6.519。该ML-BA与日常生活基本活动、日常生活工具性活动、下肢活动能力和上肢活动能力方面的残疾以及死亡率显著相关,优势比范围为1%至7%(ML-BA每增加1岁,均<0.001),且独立于实际年龄。这些关联通常与KDM-BA的关联相当。本研究为中国中老年成年人提供了一种基于机器学习的有效生物衰老测量方法。这些发现支持机器学习在老年科学研究中的应用,并可能有助于促进预防性和老年保护干预研究。