Genome Evolution Laboratory, National Institute of Genetics, Mishima, Japan.
Department of Biological Information, Tokyo Institute of Technology, Ookayama, Meguro-ku, Tokyo, Japan.
PLoS Comput Biol. 2018 Jun 6;14(6):e1006143. doi: 10.1371/journal.pcbi.1006143. eCollection 2018 Jun.
As data for microbial community structures found in various environments has increased, studies have examined the relationship between environmental labels given to retrieved microbial samples and their community structures. However, because environments continuously change over time and space, mixed states of some environments and its effects on community formation should be considered, instead of evaluating effects of discrete environmental categories. Here we applied a hierarchical Bayesian model to paired datasets containing more than 30,000 samples of microbial community structures and sample description documents. From the training results, we extracted latent environmental topics that associate co-occurring microbes with co-occurring word sets among samples. Topics are the core elements of environmental mixtures and the visualization of topic-based samples clarifies the connections of various environments. Based on the model training results, we developed a web application, LEA (Latent Environment Allocation), which provides the way to evaluate typicality and heterogeneity of microbial communities in newly obtained samples without confining environmental categories to be compared. Because topics link words and microbes, LEA also enables to search samples semantically related to the query out of 30,000 microbiome samples.
随着在各种环境中发现的微生物群落结构数据的增加,研究已经检查了检索到的微生物样本的环境标签与其群落结构之间的关系。然而,由于环境随时间和空间不断变化,应该考虑一些环境的混合状态及其对群落形成的影响,而不是评估离散环境类别的影响。在这里,我们应用了一个层次贝叶斯模型,该模型包含了超过 30000 个微生物群落结构和样本描述文档的配对数据集。从训练结果中,我们提取了潜在的环境主题,这些主题将共同出现的微生物与样本之间共同出现的单词集联系起来。主题是环境混合物的核心要素,基于主题的样本可视化澄清了各种环境之间的联系。基于模型训练结果,我们开发了一个 Web 应用程序,LEA(潜在环境分配),它提供了一种在新获得的样本中评估微生物群落典型性和异质性的方法,而无需将比较的环境类别限制为离散类别。由于主题将单词和微生物联系起来,LEA 还可以从 30000 个微生物组样本中搜索与查询语义相关的样本。