Department of Mathematics and Statistics, Dalhousie University, Halifax, NS Canada.
Department of Biology, Dalhousie University, Halifax, NS Canada.
Microbiome. 2015 Mar 10;3:8. doi: 10.1186/s40168-015-0073-x. eCollection 2015.
Microbiome samples often represent mixtures of communities, where each community is composed of overlapping assemblages of species. Such mixtures are complex, the number of species is huge and abundance information for many species is often sparse. Classical methods have a limited value for identifying complex features within such data.
Here, we describe a novel hierarchical model for Bayesian inference of microbial communities (BioMiCo). The model takes abundance data derived from environmental DNA, and models the composition of each sample by a two-level hierarchy of mixture distributions constrained by Dirichlet priors. BioMiCo is supervised, using known features for samples and appropriate prior constraints to overcome the challenges posed by many variables, sparse data, and large numbers of rare species. The model is trained on a portion of the data, where it learns how assemblages of species are mixed to form communities and how assemblages are related to the known features of each sample. Training yields a model that can predict the features of new samples. We used BioMiCo to build models for three serially sampled datasets and tested their predictive accuracy across different time points. The first model was trained to predict both body site (hand, mouth, and gut) and individual human host. It was able to reliably distinguish these features across different time points. The second was trained on vaginal microbiomes to predict both the Nugent score and individual human host. We found that women having normal and elevated Nugent scores had distinct microbiome structures that persisted over time, with additional structure within women having elevated scores. The third was trained for the purpose of assessing seasonal transitions in a coastal bacterial community. Application of this model to a high-resolution time series permitted us to track the rate and time of community succession and accurately predict known ecosystem-level events.
BioMiCo provides a framework for learning the structure of microbial communities and for making predictions based on microbial assemblages. By training on carefully chosen features (abiotic or biotic), BioMiCo can be used to understand and predict transitions between complex communities composed of hundreds of microbial species.
微生物组样本通常代表群落的混合物,其中每个群落由重叠的物种组合组成。这种混合物很复杂,物种数量巨大,许多物种的丰度信息往往很稀疏。经典方法在识别此类数据中的复杂特征方面的价值有限。
在这里,我们描述了一种用于微生物群落贝叶斯推断的新层次模型(BioMiCo)。该模型采用源自环境 DNA 的丰度数据,并通过由 Dirichlet 先验约束的两级混合物分布模型来对每个样本的组成进行建模。BioMiCo 是监督的,使用样本的已知特征和适当的先验约束来克服许多变量、稀疏数据和大量稀有物种带来的挑战。该模型在数据的一部分上进行训练,在该部分中,它学习了物种组合如何混合形成群落,以及组合如何与每个样本的已知特征相关。训练产生了一个可以预测新样本特征的模型。我们使用 BioMiCo 为三个连续采样数据集构建模型,并在不同时间点测试其预测准确性。第一个模型经过训练可预测身体部位(手、口和肠道)和个体人类宿主。它能够在不同时间点可靠地区分这些特征。第二个模型经过训练可预测阴道微生物组中的 Nugent 评分和个体人类宿主。我们发现,Nugent 评分正常和升高的女性具有不同的微生物组结构,这些结构随时间推移而保持不变,而评分升高的女性的结构则有所增加。第三个模型是为评估沿海细菌群落的季节性转变而构建的。该模型在高分辨率时间序列上的应用使我们能够跟踪群落演替的速度和时间,并准确预测已知的生态系统级事件。
BioMiCo 为学习微生物群落的结构和基于微生物组合进行预测提供了一个框架。通过在精心挑选的特征(非生物或生物)上进行训练,BioMiCo 可用于理解和预测由数百种微生物物种组成的复杂群落之间的转变。