School of Mathematical Sciences, Peking University, Beijing, China.
Center for Statistical Science, LMAM, School of Mathematical Sciences, Peking University, Beijing, China.
Bioinformatics. 2019 Sep 15;35(18):3404-3411. doi: 10.1093/bioinformatics/btz098.
With the development of high-throughput sequencing techniques for 16S-rRNA gene profiling, the analysis of microbial communities is becoming more and more attractive and reliable. Inferring the direct interaction network among microbial communities helps in the identification of mechanisms underlying community structure. However, the analysis of compositional data remains challenging by the relative information conveyed by such data, as well as its high dimensionality.
In this article, we first propose a novel loss function for compositional data called CD-trace based on D-trace loss. A sparse matrix estimator for the direct interaction network is defined as the minimizer of lasso penalized CD-trace loss under positive-definite constraint. An efficient alternating direction algorithm is developed for numerical computation. Simulation results show that CD-trace compares favorably to gCoda and that it is better than sparse inverse covariance estimation for ecological association inference (SPIEC-EASI) (hereinafter S-E) in network recovery with compositional data. Finally, we test CD-trace and compare it to the other methods noted above using mouse skin microbiome data.
The CD-trace is open source and freely available from https://github.com/coamo2/CD-trace under GNU LGPL v3.
Supplementary data are available at Bioinformatics online.
随着高通量测序技术在 16S-rRNA 基因谱分析中的发展,微生物群落的分析变得越来越有吸引力和可靠。推断微生物群落之间的直接相互作用网络有助于识别群落结构背后的机制。然而,由于此类数据所传达的相对信息以及其高维性,对组成数据的分析仍然具有挑战性。
在本文中,我们首先提出了一种新的基于 D-trace 损失的组成数据损失函数,称为 CD-trace。将直接相互作用网络的稀疏矩阵估计定义为正定约束下稀疏惩罚 CD-trace 损失的最小化。开发了一种有效的交替方向算法进行数值计算。模拟结果表明,CD-trace 与 gCoda 相比具有优势,并且在使用组成数据进行网络恢复方面优于生态关联推断(SPIEC-EASI)(以下简称 S-E)的稀疏逆协方差估计(S-E)。最后,我们使用小鼠皮肤微生物组数据测试了 CD-trace,并将其与上述其他方法进行了比较。
CD-trace 是开源的,并可在 GNU LGPL v3 下从 https://github.com/coamo2/CD-trace 免费获得。
补充数据可在生物信息学在线获得。