Department of Biostatistics, University of Texas MD Anderson Cancer Center, 1400 Pressler St., Houston, TX, USA.
Center for Devices and Radiological Health, U.S. Food and Drug Administration, 10903 New Hampshire Avenue, Silver Sp, MD, USA.
BMC Bioinformatics. 2020 Dec 28;21(Suppl 21):581. doi: 10.1186/s12859-020-03911-w.
The estimation of microbial networks can provide important insight into the ecological relationships among the organisms that comprise the microbiome. However, there are a number of critical statistical challenges in the inference of such networks from high-throughput data. Since the abundances in each sample are constrained to have a fixed sum and there is incomplete overlap in microbial populations across subjects, the data are both compositional and zero-inflated.
We propose the COmpositional Zero-Inflated Network Estimation (COZINE) method for inference of microbial networks which addresses these critical aspects of the data while maintaining computational scalability. COZINE relies on the multivariate Hurdle model to infer a sparse set of conditional dependencies which reflect not only relationships among the continuous values, but also among binary indicators of presence or absence and between the binary and continuous representations of the data. Our simulation results show that the proposed method is better able to capture various types of microbial relationships than existing approaches. We demonstrate the utility of the method with an application to understanding the oral microbiome network in a cohort of leukemic patients.
Our proposed method addresses important challenges in microbiome network estimation, and can be effectively applied to discover various types of dependence relationships in microbial communities. The procedure we have developed, which we refer to as COZINE, is available online at https://github.com/MinJinHa/COZINE .
微生物网络的估计可以为构成微生物组的生物体之间的生态关系提供重要的见解。然而,从高通量数据中推断这样的网络存在许多关键的统计挑战。由于每个样本中的丰度受到固定和不可分割的约束,并且不同个体之间的微生物种群存在不完全重叠,因此数据是组成的和零膨胀的。
我们提出了 COmpositional Zero-Inflated Network Estimation(COZINE)方法来推断微生物网络,该方法解决了数据的这些关键方面,同时保持了计算的可扩展性。COZINE 依赖于多元障碍模型来推断一组稀疏的条件依赖性,这些依赖性不仅反映了连续值之间的关系,还反映了存在或缺失的二进制指标之间的关系,以及数据的二进制和连续表示之间的关系。我们的模拟结果表明,与现有方法相比,所提出的方法能够更好地捕捉各种类型的微生物关系。我们通过应用于白血病患者队列中的口腔微生物组网络来证明该方法的实用性。
我们提出的方法解决了微生物网络估计中的重要挑战,可以有效地应用于发现微生物群落中的各种类型的依赖关系。我们开发的程序,我们称之为 COZINE,可在 https://github.com/MinJinHa/COZINE 上在线获得。