Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China.
Biostatistics and Bioinformatics Branch, Division of Population Health Research, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health, Bethesda, MD 20817, USA.
Bioinformatics. 2022 Jul 11;38(14):3493-3500. doi: 10.1093/bioinformatics/btac361.
Microbial communities have been shown to be associated with many complex diseases, such as cancers and cardiovascular diseases. The identification of differentially abundant taxa is clinically important. It can help understand the pathology of complex diseases, and potentially provide preventive and therapeutic strategies. Appropriate differential analyses for microbiome data are challenging due to its unique data characteristics including compositional constraint, excessive zeros and high dimensionality. Most existing approaches either ignore these data characteristics or only account for the compositional constraint by using log-ratio transformations with zero observations replaced by a pseudocount. However, there is no consensus on how to choose a pseudocount. More importantly, ignoring the characteristic of excessive zeros may result in poorly powered analyses and therefore yield misleading findings.
We develop a novel microbiome-based direction-assisted test for the detection of overall difference in microbial relative abundances between two health conditions, which simultaneously incorporates the characteristics of relative abundance data. The proposed test (i) divides the taxa into two clusters by the directions of mean differences of relative abundances and then combines them at cluster level, in light of the compositional characteristic; and (ii) contains a burden type test, which collapses multiple taxa into a single one to account for excessive zeros. Moreover, the proposed test is an adaptive procedure, which can accommodate high-dimensional settings and yield high power against various alternative hypotheses. We perform extensive simulation studies across a wide range of scenarios to evaluate the proposed test and show its substantial power gain over some existing tests. The superiority of the proposed approach is further demonstrated with real datasets from two microbiome studies.
An R package for MiDAT is available at https://github.com/zhangwei0125/MiDAT.
Supplementary data are available at Bioinformatics online.
微生物群落与许多复杂疾病(如癌症和心血管疾病)有关。差异丰度分类群的鉴定具有重要的临床意义。它可以帮助理解复杂疾病的病理学,并为预防和治疗策略提供潜在的方法。由于微生物组数据具有独特的数据特征,包括组成约束、过多的零值和高维度,因此对其进行适当的差异分析具有挑战性。大多数现有方法要么忽略这些数据特征,要么仅通过使用带有零观测值的对数比变换并以伪计数替换的方法来考虑组成约束。然而,对于如何选择伪计数还没有共识。更重要的是,忽略过多零值的特征可能会导致分析能力不足,从而产生误导性的发现。
我们开发了一种新的基于微生物组的方向辅助检验方法,用于检测两种健康状况下微生物相对丰度的总体差异,该方法同时考虑了相对丰度数据的特征。所提出的检验方法 (i) 根据相对丰度差异的平均值方向将分类群分为两个簇,然后根据组成特征在簇水平上对它们进行组合;(ii) 包含一种负担类型检验,它将多个分类群合并为一个,以考虑过多的零值。此外,所提出的检验方法是一种自适应程序,可以适应高维设置,并针对各种替代假设产生高功效。我们在广泛的场景中进行了广泛的模拟研究,以评估所提出的检验方法,并展示其相对于一些现有检验方法的实质性功效增益。通过来自两个微生物组研究的真实数据集,进一步证明了所提出方法的优越性。
MiDAT 的 R 包可在 https://github.com/zhangwei0125/MiDAT 上获得。
补充数据可在生物信息学在线获得。