Zhou Jie, Gui Jiang, Viles Weston D, Chen Haobin, Li Siting, Madan Juliette C, Coker Modupe O, Hoen Anne G
Department of Biomedical Data Science, Geisel School of Medicine, Dartmouth College, Hanover, NH, United States.
Khoury College of Computer Science, Northeastern University, Portland, ME, United States.
Front Microbiomes. 2024;3. doi: 10.3389/frmbi.2024.1366948. Epub 2024 Jun 2.
The microbial interactions within the human microbiome are complex, and few methods are available to identify these interactions within a longitudinal microbial abundance framework. Existing methods typically impose restrictive constraints, such as requiring long sequences and equal spacing, on the data format which in many cases are violated.
To identify microbial interaction networks (MINs) with general longitudinal data settings, we propose a stationary Gaussian graphical model (SGGM) based on 16S rRNA gene sequencing data. In the SGGM, data can be arbitrarily spaced, and there are no restrictions on the length of data sequences from a single subject. Based on the SGGM, EM-type algorithms are devised to compute the 1-penalized maximum likelihood estimate of MINs. The algorithms employ the classical graphical LASSO algorithm as the building block and can be implemented efficiently.
Extensive simulation studies show that the proposed algorithms can significantly outperform the conventional algorithms if the correlations among the longitudinal data are reasonably high. When the assumptions in the SGGM areviolated, e.g., zero inflation or data from heterogeneous microbial communities, the proposed algorithms still demonstrate robustness and perform better than the other existing algorithms. The algorithms are applied to a 16S rRNA gene sequencing data set from patients with cystic fibrosis. The results demonstrate strong evidence of an association between the MINs and the phylogenetic tree, indicating that the genetically related taxa tend to have more/stronger interactions. These results strengthen the existing findings in literature.
The proposed algorithms can potentially be used to explore the network structure in genome, metabolome etc. as well.
人类微生物组内的微生物相互作用十分复杂,在纵向微生物丰度框架内识别这些相互作用的方法很少。现有方法通常对数据格式施加严格限制,例如要求长序列和等间距,而在许多情况下这些限制会被违反。
为了在一般纵向数据设置下识别微生物相互作用网络(MINs),我们基于16S rRNA基因测序数据提出了一种平稳高斯图形模型(SGGM)。在SGGM中,数据可以任意间隔,并且对单个受试者的数据序列长度没有限制。基于SGGM,设计了EM型算法来计算MINs的1-惩罚最大似然估计。这些算法采用经典图形套索算法作为构建模块,并且可以高效实现。
大量模拟研究表明,如果纵向数据之间的相关性较高,所提出的算法可以显著优于传统算法。当SGGM中的假设被违反时,例如零膨胀或来自异质微生物群落的数据,所提出的算法仍然表现出稳健性,并且比其他现有算法表现更好。这些算法被应用于囊性纤维化患者的16S rRNA基因测序数据集。结果表明MINs与系统发育树之间存在关联的有力证据,表明遗传相关的分类群往往具有更多/更强的相互作用。这些结果强化了文献中的现有发现。
所提出的算法也有可能用于探索基因组学、代谢组学等中的网络结构。