The Centre for Applied Genomics, Genetics and Genome Biology, The Hospital for Sick Children, Toronto, ON, Canada.
Department of Nutrition and Dietetics, School of Health Science and Education, Harokopio University, Athens, Greece.
Med Sci Monit. 2019 Mar 17;25:1994-2001. doi: 10.12659/MSM.913283.
BACKGROUND Studies on the effects of sociodemographic factors on health in aging now include the use of statistical models and machine learning. The aim of this study was to evaluate the determinants of health in aging using machine learning methods and to compare the accuracy with traditional methods. MATERIAL AND METHODS The health status of 6,209 adults, age <65 years (n=1,585), 65-79 years (n=3,267), and >80 years (n=1,357) were measured using an established health metric (0-100) that incorporated physical function and activities of daily living (ADL). Data from the English Longitudinal Study of Ageing (ELSA) included socio-economic and sociodemographic characteristics and history of falls. Health-trend and personal-fitted variables were generated as predictors of health metrics using three machine learning methods, random forest (RF), deep learning (DL) and the linear model (LM), with calculation of the percentage increase in mean square error (%IncMSE) as a measure of the importance of a given predictive variable, when the variable was removed from the model. RESULTS Health-trend, physical activity, and personal-fitted variables were the main predictors of health, with the%incMSE of 85.76%, 63.40%, and 46.71%, respectively. Age, employment status, alcohol consumption, and household income had the%incMSE of 20.40%, 20.10%, 16.94%, and 13.61%, respectively. Performance of the RF method was similar to the traditional LM (p=0.7), but RF significantly outperformed DL (p=0.006). CONCLUSIONS Machine learning methods can be used to evaluate multidimensional longitudinal health data and may provide accurate results with fewer requirements when compared with traditional statistical modeling.
目前,有关社会人口因素对衰老健康影响的研究包括使用统计模型和机器学习。本研究旨在使用机器学习方法评估衰老健康的决定因素,并与传统方法进行比较准确性。
使用一种既定的健康指标(0-100)来衡量 6209 名成年人的健康状况,年龄<65 岁(n=1585)、65-79 岁(n=3267)和>80 岁(n=1357),该指标综合了身体功能和日常生活活动(ADL)。来自英国老龄化纵向研究(ELSA)的数据包括社会经济和社会人口特征以及跌倒史。使用三种机器学习方法(随机森林(RF)、深度学习(DL)和线性模型(LM))生成健康趋势和个人拟合变量作为健康指标的预测因子,计算给定预测变量的均方误差(MSE)增加百分比(%IncMSE)作为衡量给定预测变量重要性的指标,当该变量从模型中移除时。
健康趋势、身体活动和个人拟合变量是健康的主要预测因子,其%IncMSE 分别为 85.76%、63.40%和 46.71%。年龄、就业状况、饮酒和家庭收入的%IncMSE 分别为 20.40%、20.10%、16.94%和 13.61%。RF 方法的性能与传统的 LM 相似(p=0.7),但 RF 明显优于 DL(p=0.006)。
机器学习方法可用于评估多维纵向健康数据,与传统统计建模相比,它需要的要求更少,可提供准确的结果。