Department of Health Research Methods, Evidence, and Impact, McMaster University, Hamilton, Ontario, Canada.
Dalla Lana School of Public Health, University of Toronto, Toronto, Ontario, Canada.
Adv Nutr. 2021 Jun 1;12(3):621-631. doi: 10.1093/advances/nmaa183.
The field of nutritional epidemiology faces challenges posed by measurement error, diet as a complex exposure, and residual confounding. The objective of this perspective article is to highlight how developments in big data and machine learning can help address these challenges. New methods of collecting 24-h dietary recalls and recording diet could enable larger samples and more repeated measures to increase statistical power and measurement precision. In addition, use of machine learning to automatically classify pictures of food could become a useful complimentary method to help improve precision and validity of dietary measurements. Diet is complex due to thousands of different foods that are consumed in varying proportions, fluctuating quantities over time, and differing combinations. Current dietary pattern methods may not integrate sufficient dietary variation, and most traditional modeling approaches have limited incorporation of interactions and nonlinearity. Machine learning could help better model diet as a complex exposure with nonadditive and nonlinear associations. Last, novel big data sources could help avoid unmeasured confounding by offering more covariates, including both omics and features derived from unstructured data with machine learning methods. These opportunities notwithstanding, application of big data and machine learning must be approached cautiously to ensure quality of dietary measurements, avoid overfitting, and confirm accurate interpretations. Greater use of machine learning and big data would also require substantial investments in training, collaborations, and computing infrastructure. Overall, we propose that judicious application of big data and machine learning in nutrition science could offer new means of dietary measurement, more tools to model the complexity of diet and its relations with diseases, and additional potential ways of addressing confounding.
营养流行病学领域面临着测量误差、复杂的饮食暴露以及残余混杂因素带来的挑战。本文的目的是强调大数据和机器学习的发展如何有助于应对这些挑战。新的 24 小时膳食回忆收集和饮食记录方法可以使样本量更大,重复测量更多,从而提高统计能力和测量精度。此外,使用机器学习自动对食物图片进行分类可能成为一种有用的辅助方法,有助于提高饮食测量的精度和有效性。饮食是复杂的,因为有数千种不同的食物以不同的比例、随时间波动的数量和不同的组合来食用。目前的饮食模式方法可能无法整合足够的饮食变化,大多数传统的建模方法对相互作用和非线性的纳入能力有限。机器学习可以帮助更好地将饮食作为一种具有非加性和非线性关联的复杂暴露进行建模。最后,新的大数据源可以通过提供更多的协变量来帮助避免未测量的混杂,包括通过机器学习方法从非结构化数据中提取的组学和特征。尽管存在这些机会,但大数据和机器学习的应用必须谨慎进行,以确保饮食测量的质量,避免过度拟合,并确认准确的解释。更多地使用机器学习和大数据也需要在培训、合作和计算基础设施方面进行大量投资。总的来说,我们提出,在营养科学中明智地应用大数据和机器学习可以提供新的饮食测量方法,更多的工具来模拟饮食的复杂性及其与疾病的关系,以及解决混杂因素的额外潜在方法。