Thomas Diana M, Knight Rob, Gilbert Jack A, Cornelis Marilyn C, Gantz Marie G, Burdekin Kate, Cummiskey Kevin, Sumner Susan C J, Pathmasiri Wimal, Sazonov Edward, Gabriel Kelley Pettee, Dooley Erin E, Green Mark A, Pfluger Andrew, Kleinberg Samantha
Department of Mathematical Sciences, United States Military Academy, West Point, New York, USA.
Bioinformatics and Systems Biology Program, University of California San Diego, La Jolla, California, USA.
Obesity (Silver Spring). 2024 May;32(5):857-870. doi: 10.1002/oby.23989. Epub 2024 Mar 1.
Big Data are increasingly used in obesity and nutrition research to gain new insights and derive personalized guidance; however, this data in raw form are often not usable. Substantial preprocessing, which requires machine learning (ML), human judgment, and specialized software, is required to transform Big Data into artificial intelligence (AI)- and ML-ready data. These preprocessing steps are the most complex part of the entire modeling pipeline. Understanding the complexity of these steps by the end user is critical for reducing misunderstanding, faulty interpretation, and erroneous downstream conclusions.
We reviewed three popular obesity/nutrition Big Data sources: microbiome, metabolomics, and accelerometry. The preprocessing pipelines, specialized software, challenges, and how decisions impact final AI- and ML-ready products were detailed.
Opportunities for advances to improve quality control, speed of preprocessing, and intelligent end user consumption were presented.
Big Data have the exciting potential for identifying new modifiable factors that impact obesity research. However, to ensure accurate interpretation of conclusions arising from Big Data, the choices involved in preparing AI- and ML-ready data need to be transparent to investigators and clinicians relying on the conclusions.
大数据在肥胖与营养研究中的应用日益广泛,以获取新的见解并得出个性化指导;然而,原始形式的数据往往无法直接使用。需要大量的预处理工作,包括机器学习(ML)、人工判断和专用软件,才能将大数据转化为适用于人工智能(AI)和机器学习的数据。这些预处理步骤是整个建模流程中最复杂的部分。终端用户了解这些步骤的复杂性对于减少误解、错误解读和错误的下游结论至关重要。
我们回顾了三种流行的肥胖/营养大数据来源:微生物组学、代谢组学和加速度测量法。详细介绍了预处理流程、专用软件、挑战以及决策如何影响最终适用于人工智能和机器学习的产品。
提出了改进质量控制、提高预处理速度和实现智能终端用户应用的进展机会。
大数据在识别影响肥胖研究的新的可改变因素方面具有令人兴奋的潜力。然而,为确保对大数据得出的结论进行准确解读,准备适用于人工智能和机器学习的数据时所涉及的选择需要对依赖这些结论的研究人员和临床医生保持透明。