Suppr超能文献

机器学习与线性回归在预测挪威学龄儿童生活质量和学业成绩方面的相对表现:一项准实验研究的数据分析。

Relative Performance of Machine Learning and Linear Regression in Predicting Quality of Life and Academic Performance of School Children in Norway: Data Analysis of a Quasi-Experimental Study.

机构信息

School of Health Sciences, Kristiania University College, Oslo, Norway.

Clinical Trials Unit, Warwick Medical School, University of Warwick, Coventry, United Kingdom.

出版信息

J Med Internet Res. 2021 Jul 16;23(7):e22021. doi: 10.2196/22021.

Abstract

BACKGROUND

Machine learning techniques are increasingly being applied in health research. It is not clear how useful these approaches are for modeling continuous outcomes. Child quality of life is associated with parental socioeconomic status and physical activity and may be associated with aerobic fitness and strength. It is unclear whether diet or academic performance is associated with quality of life.

OBJECTIVE

The purpose of this study was to compare the predictive performance of machine learning techniques with that of linear regression in examining the extent to which continuous outcomes (physical activity, aerobic fitness, muscular strength, diet, and parental education) are predictive of academic performance and quality of life and whether academic performance and quality of life are associated.

METHODS

We modeled data from children attending 9 schools in a quasi-experimental study. We split data randomly into training and validation sets. Curvilinear, nonlinear, and heteroscedastic variables were simulated to examine the performance of machine learning techniques compared to that of linear models, with and without imputation.

RESULTS

We included data for 1711 children. Regression models explained 24% of academic performance variance in the real complete-case validation set, and up to 15% in quality of life. While machine learning techniques explained high proportions of variance in training sets, in validation, machine learning techniques explained approximately 0% of academic performance and 3% to 8% of quality of life. With imputation, machine learning techniques improved to 15% for academic performance. Machine learning outperformed regression for simulated nonlinear and heteroscedastic variables. The best predictors of academic performance in adjusted models were the child's mother having a master-level education (P<.001; β=1.98, 95% CI 0.25 to 3.71), increased television and computer use (P=.03; β=1.19, 95% CI 0.25 to 3.71), and dichotomized self-reported exercise (P=.001; β=2.47, 95% CI 1.08 to 3.87). For quality of life, self-reported exercise (P<.001; β=1.09, 95% CI 0.53 to 1.66) and increased television and computer use (P=.002; β=-0.95, 95% CI -1.55 to -0.36) were the best predictors. Adjusted academic performance was associated with quality of life (P=.02; β=0.12, 95% CI 0.02 to 0.22).

CONCLUSIONS

Linear regression was less prone to overfitting and outperformed commonly used machine learning techniques. Imputation improved the performance of machine learning, but not sufficiently to outperform regression. Machine learning techniques outperformed linear regression for modeling nonlinear and heteroscedastic relationships and may be of use in such cases. Regression with splines performed almost as well in nonlinear modeling. Lifestyle variables, including physical exercise, television and computer use, and parental education are predictive of academic performance or quality of life. Academic performance is associated with quality of life after adjusting for lifestyle variables and may offer another promising intervention target to improve quality of life in children.

摘要

背景

机器学习技术在健康研究中越来越多地得到应用。目前尚不清楚这些方法在建模连续结果方面有多有用。儿童生活质量与父母的社会经济地位和体力活动有关,可能与有氧健身和力量有关。目前尚不清楚饮食或学业成绩是否与生活质量有关。

目的

本研究的目的是比较机器学习技术与线性回归在检验连续结果(体力活动、有氧健身、肌肉力量、饮食和父母教育)对学业成绩和生活质量的预测程度方面的表现,以及学业成绩和生活质量之间的关系。

方法

我们对参加一项准实验研究的 9 所学校的儿童数据进行建模。我们随机将数据分为训练集和验证集。模拟了曲线、非线性和异方差变量,以检验机器学习技术与线性模型的表现,包括有无插补。

结果

我们纳入了 1711 名儿童的数据。回归模型在真实完整案例验证集中解释了学业成绩方差的 24%,在生活质量方面解释了 15%。虽然机器学习技术在训练集中解释了很高的比例的方差,但在验证集中,机器学习技术仅解释了学业成绩的 0%,解释了生活质量的 3%至 8%。使用插补后,机器学习技术在学业成绩方面提高到 15%。对于模拟的非线性和异方差变量,机器学习技术优于回归。在调整后的模型中,预测学业成绩的最佳指标是孩子的母亲接受过硕士教育(P<.001;β=1.98,95%置信区间 0.25 至 3.71),增加电视和电脑使用(P=.03;β=1.19,95%置信区间 0.25 至 3.71),以及自我报告的运动二分法(P=.001;β=2.47,95%置信区间 1.08 至 3.87)。对于生活质量,自我报告的运动(P<.001;β=1.09,95%置信区间 0.53 至 1.66)和增加电视和电脑使用(P=.002;β=-0.95,95%置信区间-1.55 至-0.36)是最佳预测指标。调整后的学业成绩与生活质量相关(P=.02;β=0.12,95%置信区间 0.02 至 0.22)。

结论

线性回归不太容易过度拟合,表现优于常用的机器学习技术。插补提高了机器学习的性能,但不足以使其优于回归。对于建模非线性和异方差关系,机器学习技术优于线性回归,并且在这种情况下可能会有所帮助。样条回归在非线性建模中表现几乎相同。生活方式变量,包括体育锻炼、电视和电脑使用以及父母教育,可预测学业成绩或生活质量。在调整生活方式变量后,学业成绩与生活质量相关,这可能为改善儿童生活质量提供另一个有前途的干预目标。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/99e6/8325075/a3662a6036a2/jmir_v23i7e22021_fig1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验