Data Analytics CoE, Data R&D Center, SK Telecom, Seoul, 04539, Republic of Korea.
J Microbiol. 2020 Mar;58(3):206-216. doi: 10.1007/s12275-020-0066-8. Epub 2020 Feb 27.
Researches on the microbiome have been actively conducted worldwide and the results have shown human gut bacterial environment significantly impacts on immune system, psychological conditions, cancers, obesity, and metabolic diseases. Thanks to the development of sequencing technology, microbiome studies with large number of samples are eligible on an acceptable cost nowadays. Large samples allow analysis of more sophisticated modeling using machine learning approaches to study relationships between microbiome and various traits. This article provides an overview of machine learning methods for non-data scientists interested in the association analysis of microbiomes and host phenotypes. Once genomic feature of microbiome is determined, various analysis methods can be used to explore the relationship between microbiome and host phenotypes that include penalized regression, support vector machine (SVM), random forest, and artificial neural network (ANN). Deep neural network methods are also touched. Analysis procedure from environment setup to extract analysis results are presented with Python programming language.
目前,全球范围内都在积极开展对微生物组的研究,研究结果表明人类肠道细菌环境对免疫系统、心理状况、癌症、肥胖和代谢性疾病有重大影响。得益于测序技术的发展,现在有大量样本的微生物组研究可以在可接受的成本下进行。大样本量允许使用机器学习方法进行更复杂的建模分析,以研究微生物组与各种特征之间的关系。本文面向对微生物组与宿主表型的关联分析感兴趣的非数据科学家,提供了一个机器学习方法概述。一旦确定了微生物组的基因组特征,就可以使用各种分析方法来探索微生物组与宿主表型之间的关系,包括惩罚回归、支持向量机(SVM)、随机森林和人工神经网络(ANN)。本文还介绍了深度神经网络方法。本文使用 Python 编程语言呈现了从环境设置到提取分析结果的分析过程。