Department of Biostatistics, Bloomberg School of Public Health, Johns Hopkins University, 615 North Wolfe Street, Office E3622, Baltimore, MD, 21205, USA.
Microbiome. 2020 May 11;8(1):63. doi: 10.1186/s40168-020-00834-9.
In human microbiome studies, it is crucial to evaluate the association between microbial group (e.g., community or clade) composition and a host phenotype of interest. In response, a number of microbial group association tests have been proposed, which account for the unique features of the microbiome data (e.g., high-dimensionality, compositionality, phylogenetic relationship). These tests generally fall in the class of aggregation tests which amplify the overall group association by combining all the underlying microbial association signals, and, therefore, they are powerful when many microbial species are associated with a given host phenotype (i.e., low sparsity). However, in practice, the microbial association signals can be highly sparse, and this is especially the situation where we have a difficulty to discover the microbial group association.
Here, we introduce a powerful microbial group association test for sparse microbial association signals, namely, microbiome higher criticism analysis (MiHC). MiHC is a data-driven omnibus test taken in a search space spanned by tailoring the higher criticism test to incorporate phylogenetic information and/or modulate sparsity levels and including the Simes test for excessively high sparsity levels. Therefore, MiHC robustly adapts to diverse phylogenetic relevance and sparsity levels.
Our simulations show that MiHC maintains a high power at different phylogenetic relevance and sparsity levels with correct type I error controls. We also apply MiHC to four real microbiome datasets to test the association between respiratory tract microbiome and smoking status, the association between the infant's gut microbiome and delivery mode, the association between the gut microbiome and type 1 diabetes status, and the association between the gut microbiome and human immunodeficiency virus status.
In practice, the true underlying association pattern on the extent of phylogenetic relevance and sparsity is usually unknown. Therefore, MiHC can be a useful analytic tool because of its high adaptivity to diverse phylogenetic relevance and sparsity levels. MiHC can be implemented in the R computing environment using our software package freely available at https://github.com/hk1785/MiHC.
在人类微生物组研究中,评估微生物群(例如群落或进化枝)组成与宿主感兴趣的表型之间的关联至关重要。为此,已经提出了许多微生物群关联测试,这些测试考虑了微生物组数据的独特特征(例如,高维性、组成性、系统发育关系)。这些测试通常属于聚合测试类别,通过组合所有潜在的微生物关联信号来放大总体群体关联,因此,当许多微生物物种与给定的宿主表型相关联时(即,低稀疏性),它们非常强大。然而,在实践中,微生物关联信号可能高度稀疏,特别是当我们难以发现微生物群关联时。
在这里,我们引入了一种强大的用于稀疏微生物关联信号的微生物群关联测试,即微生物组高阶批评分析(MiHC)。MiHC 是一种数据驱动的整体测试,在通过调整高阶批评测试以纳入系统发育信息和/或调节稀疏水平并包括用于过高稀疏水平的 Simes 测试的搜索空间中进行。因此,MiHC 能够稳健地适应不同的系统发育相关性和稀疏水平。
我们的模拟表明,MiHC 在不同的系统发育相关性和稀疏水平下保持高功效,同时正确控制第一类错误。我们还将 MiHC 应用于四个真实的微生物组数据集,以测试呼吸道微生物组与吸烟状态之间的关联、婴儿肠道微生物组与分娩方式之间的关联、肠道微生物组与 1 型糖尿病状态之间的关联以及肠道微生物组与人类免疫缺陷病毒状态之间的关联。
在实践中,关于系统发育相关性和稀疏程度的真实潜在关联模式通常是未知的。因此,由于其对不同系统发育相关性和稀疏水平的高适应性,MiHC 可以成为一种有用的分析工具。MiHC 可以在 R 计算环境中使用我们的软件包实现,该软件包可在 https://github.com/hk1785/MiHC 上免费获得。