Department of Plant and Microbial Biology, University of California Berkeley, Berkeley, CA 94720, USA.
Mol Ecol. 2010 Dec;19(24):5555-65. doi: 10.1111/j.1365-294X.2010.04898.x. Epub 2010 Nov 3.
Pyrosequencing technologies have revolutionized how we describe and compare complex microbial communities. In 454 pyrosequencing data sets, the abundance of reads pertaining to taxa or phylotypes is commonly interpreted as a measure of genic or taxon abundance, useful for quantitative comparisons of community similarity. Potentially systematic biases inherent in sample processing, amplification and sequencing, however, may alter read abundance and reduce the utility of quantitative metrics. Here, we examine the relationship between read abundance and biological abundance in a sample of house dust spiked with known quantities and identities of fungi along a dilution gradient. Our results show one order of magnitude differences in read abundance among species. Precision of quantification within species along the dilution gradient varied from R(2) of 0.96-0.54. Read-quality based processing stringency profoundly affected the abundance of one species containing long homopolymers in a read orientation-biased manner. Order-level composition of background environmental fungal communities determined from pyrosequencing data was comparable with that derived from cloning and Sanger sequencing and was not biased by read orientation. We conclude that read abundance is approximately quantitative within species, but between-species comparisons can be biased by innate sequence structure. Our results showed a trade off between sequence quality stringency and quantification. Careful consideration of sequence processing methods and community analyses are warranted when testing hypotheses using read abundance data.
焦磷酸测序技术改变了我们描述和比较复杂微生物群落的方式。在 454 焦磷酸测序数据集中,与分类群或系统发育型相关的读取丰度通常被解释为基因或分类群丰度的度量,可用于群落相似性的定量比较。然而,样品处理、扩增和测序中固有的潜在系统偏差可能会改变读取丰度,降低定量指标的实用性。在这里,我们检查了在已知真菌数量和身份的房屋灰尘样本中,随着稀释梯度,读取丰度与生物丰度之间的关系。我们的结果表明,在物种之间,读取丰度存在一个数量级的差异。在稀释梯度上,物种内的定量精度从 R(2)的 0.96-0.54 不等。基于读取质量的处理严格性以读取定向偏倚的方式深刻影响了一种含有长同源聚合物的物种的丰度。从焦磷酸测序数据推断的背景环境真菌群落的分类水平组成与从克隆和 Sanger 测序得出的组成相似,不受读取定向的影响。我们得出结论,在物种内,读取丰度大致是定量的,但种间比较可能会受到固有序列结构的影响。我们的结果表明,在序列质量严格性和定量之间存在权衡。在使用读取丰度数据检验假设时,需要仔细考虑序列处理方法和群落分析。