Department of Computer Science, Genomics and Bioinformatics Cluster, University of Central Florida, Orlando, Florida, USA.
J Comput Biol. 2022 Jul;29(7):724-737. doi: 10.1089/cmb.2021.0595. Epub 2022 May 12.
Microbial associations are characterized by both direct and indirect interactions between the constituent taxa in a microbial community, and play an important role in determining the structure, organization, and function of the community. Microbial associations can be represented using a weighted graph (microbial network), whose nodes represent taxa and edges represent pairwise associations. A microbial network is typically inferred from a sample-taxa matrix that is obtained by sequencing multiple biological samples and identifying the taxa counts in each sample. However, it is known that microbial associations are impacted by environmental and/or host factors. Thus, a sample-taxa matrix generated in a microbiome study involving a wide range of values for the environmental and/or clinical metadata variables may in fact be associated with more than one microbial network. In this study, we consider the problem of inferring multiple microbial networks from a given sample-taxa count matrix. Each sample is a count vector assumed to be generated by a mixture model consisting of component distributions that are multivariate Poisson log-normal. We present a variational expectation maximization algorithm for the model selection problem to infer the correct number of components of this mixture model. Our approach involves reframing the mixture model as a latent variable model, treating only the mixing coefficients as parameters, and subsequently approximating the marginal likelihood using an evidence lower bound framework. Our algorithm is evaluated on a large simulated dataset generated using a collection of different graph structures (band, hub, cluster, random, and scale-free).
微生物群落的组成种间存在直接和间接相互作用,这些相互作用在决定群落的结构、组织和功能方面起着重要作用。微生物群落可以用加权图(微生物网络)来表示,其中节点代表分类单元,边代表两两之间的关联。微生物网络通常是从测序多个生物样本并在每个样本中识别分类单元计数得到的样本-分类单元矩阵中推断出来的。然而,已知微生物群落受到环境和/或宿主因素的影响。因此,在涉及环境和/或临床元数据变量的广泛范围值的微生物组研究中生成的样本-分类单元矩阵实际上可能与多个微生物网络相关联。在这项研究中,我们考虑了从给定的样本-分类单元计数矩阵中推断多个微生物网络的问题。每个样本都是一个计数向量,假设它是由一个混合模型生成的,该模型由多元泊松对数正态分量分布组成。我们提出了一种用于模型选择问题的变分期望最大化算法,以推断这种混合模型的正确成分数量。我们的方法将混合模型重新表示为一个潜在变量模型,仅将混合系数视为参数,然后使用证据下界框架近似边际似然。我们的算法在使用不同图结构(带、集线器、聚类、随机和无标度)的集合生成的大型模拟数据集上进行了评估。