Wang Shulei, Cai T Tony, Li Hongzhe
Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania 19104, U.S.A.
Department of Statistics, The Wharton School, University of Pennsylvania, Philadelphia, Pennsylvania 19104, U.S.A.
Biometrika. 2020 Jul 11;108(1):17-36. doi: 10.1093/biomet/asaa061. eCollection 2021 Mar.
Quantitative comparison of microbial composition from different populations is a fundamental task in various microbiome studies. We consider two-sample testing for microbial compositional data by leveraging phylogenetic information. Motivated by existing phylogenetic distances, we take a minimum-cost flow perspective to study such testing problems. We first show that multivariate analysis of variance with permutation using phylogenetic distances, one of the most commonly used methods in practice, is essentially a sum-of-squares type of test and has better power for dense alternatives. However, empirical evidence from real datasets suggests that the phylogenetic microbial composition difference between two populations is usually sparse. Motivated by this observation, we propose a new maximum type test, detector of active flow on a tree, and investigate its properties. We show that the proposed method is particularly powerful against sparse phylogenetic composition difference and enjoys certain optimality. The practical merit of the proposed method is demonstrated by simulation studies and an application to a human intestinal biopsy microbiome dataset on patients with ulcerative colitis.
不同群体微生物组成的定量比较是各种微生物组研究中的一项基本任务。我们通过利用系统发育信息来考虑对微生物组成数据进行双样本检验。受现有系统发育距离的启发,我们从最小成本流的角度来研究此类检验问题。我们首先表明,在实践中最常用的方法之一,即使用系统发育距离进行置换的多变量方差分析,本质上是一种平方和类型的检验,并且对于密集型备择假设具有更好的检验功效。然而,来自真实数据集的经验证据表明,两个群体之间的系统发育微生物组成差异通常是稀疏的。受这一观察结果的启发,我们提出了一种新的最大值类型检验——树上活跃流检测器,并研究了它的性质。我们表明,所提出的方法对于稀疏的系统发育组成差异特别有效,并且具有一定的最优性。模拟研究以及对溃疡性结肠炎患者的人类肠道活检微生物组数据集的应用证明了所提出方法的实际价值。