Chen Eric Z, Bushman Frederic D, Li Hongzhe
Department of Biostatistics and Epidemiology, University of Pennsylvania School of Medicine, Philadelphia, PA 19104, USA.
Department of Microbiology, University of Pennsylvania School of Medicine, Philadelphia, PA 19104, USA.
Stat Biosci. 2017 Jun;9(1):13-27. doi: 10.1007/s12561-016-9148-x. Epub 2016 May 16.
The human microbiome, which includes the collective microbes residing in or on the human body, has a profound influence on the human health. DNA sequencing technology has made the large-scale human microbiome studies possible by using shotgun metagenomic sequencing. One important aspect of data analysis of such metagenomic data is to quantify the bacterial abundances based on the metagenomic sequencing data. Existing methods almost always quantify such abundances one sample at a time, which ignore certain systematic differences in read coverage along the genomes due to GC contents, copy number variation and the bacterial origin of replication. In order to account for such differences in read counts, we propose a multi-sample Poisson model to quantify microbial abundances based on read counts that are assigned to species-specific taxonomic markers. Our model takes into account the marker-specific effects when normalizing the sequencing count data in order to obtain more accurate quantification of the species abundances. Compared to currently available methods on simulated data and real data sets, our method has demonstrated an improved accuracy in bacterial abundance quantification, which leads to more biologically interesting results from downstream data analysis.
人类微生物组包括存在于人体内部或体表的所有微生物,对人类健康有着深远影响。DNA测序技术通过鸟枪法宏基因组测序使大规模人类微生物组研究成为可能。此类宏基因组数据分析的一个重要方面是根据宏基因组测序数据对细菌丰度进行量化。现有方法几乎总是一次对一个样本的此类丰度进行量化,这忽略了由于GC含量、拷贝数变异和细菌复制起点等因素导致的沿基因组读取覆盖的某些系统差异。为了考虑读取计数中的此类差异,我们提出了一种多样本泊松模型,基于分配给物种特异性分类标记的读取计数来量化微生物丰度。我们的模型在对测序计数数据进行归一化时考虑了标记特异性效应,以便更准确地量化物种丰度。与目前在模拟数据和真实数据集上可用的方法相比,我们的方法在细菌丰度量化方面表现出更高的准确性,这使得下游数据分析能够得出更具生物学意义的结果。