Shi Pixu, Li Hongzhe
Department of Biostatistics and Epidemiology, University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania 19104, U.S.A.
Biometrics. 2017 Dec;73(4):1266-1278. doi: 10.1111/biom.12681. Epub 2017 Mar 30.
In human microbiome studies, sequencing reads data are often summarized as counts of bacterial taxa at various taxonomic levels specified by a taxonomic tree. This article considers the problem of analyzing two repeated measurements of microbiome data from the same subjects. Such data are often collected to assess the change of microbial composition after certain treatment, or the difference in microbial compositions across body sites. Existing models for such count data are limited in modeling the covariance structure of the counts and in handling paired multinomial count data. A new probability distribution is proposed for paired-multinomial count data, which allows flexible covariance structure and can be used to model repeatedly measured multivariate count data. Based on this distribution, a test statistic is developed for testing the difference in compositions based on paired multinomial count data. The proposed test can be applied to the count data observed on a taxonomic tree in order to test difference in microbiome compositions and to identify the subtrees with different subcompositions. Simulation results indicate that proposed test has correct type 1 errors and increased power compared to some commonly used methods. An analysis of an upper respiratory tract microbiome data set is used to illustrate the proposed methods.
在人类微生物组研究中,测序读数数据通常被总结为分类树指定的各个分类水平上细菌分类群的计数。本文考虑分析来自同一受试者的微生物组数据的两次重复测量的问题。收集此类数据通常是为了评估某种治疗后微生物组成的变化,或不同身体部位微生物组成的差异。现有的此类计数数据模型在对计数的协方差结构建模以及处理配对多项计数数据方面存在局限性。针对配对多项计数数据提出了一种新的概率分布,它允许灵活的协方差结构,可用于对重复测量的多变量计数数据进行建模。基于此分布,开发了一种检验统计量,用于基于配对多项计数数据检验组成差异。所提出的检验可应用于在分类树上观察到的计数数据,以检验微生物组组成的差异并识别具有不同子组成的子树。模拟结果表明,与一些常用方法相比,所提出的检验具有正确的一类错误率且功效有所提高。对一个上呼吸道微生物组数据集的分析用于说明所提出的方法。