Food Safety and Enteric Pathogens Research Unit, National Animal Disease Center, Agricultural Research Service, Ames, IA, 50010, USA.
Microbiome. 2013 Feb 4;1(1):5. doi: 10.1186/2049-2618-1-5.
Viruses are important drivers of ecosystem functions, yet little is known about the vast majority of viruses. Viral shotgun metagenomics enables the investigation of broad ecological questions in phage communities. One ecological characteristic is species richness, which is the number of different species in a community. Viruses do not have a phylogenetic marker analogous to the bacterial 16S rRNA gene with which to estimate richness, and so contig spectra are employed to measure the number of virus taxa in a given community. A contig spectrum is generated from a viral shotgun metagenome by assembling the random sequence reads into groups of sequences that overlap (contigs) and counting the number of sequences that group within each contig. Current tools available to analyze contig spectra to estimate phage richness are limited by relying on rank-abundance data.
We present statistical estimates of virus richness from contig spectra. The program CatchAll (http://www.northeastern.edu/catchall/) was used to analyze contig spectra in terms of frequency count data rather than rank-abundance, thus enabling formal statistical analyses. Also, the influence of potentially spurious low-frequency counts on richness estimates was minimized by two methods, empirical and statistical. The results show greater estimates of viral richness than previous calculations in nearly all environments analyzed, including swine feces and reclaimed fresh water.
CatchAll yielded consistent estimates of richness across viral metagenomes from the same or similar environments. Additionally, analysis of pooled viral metagenomes from different environments via mixed contig spectra resulted in greater richness estimates than those of the component metagenomes. Using CatchAll to analyze contig spectra will improve estimations of richness from viral shotgun metagenomes, particularly from large datasets, by providing statistical measures of richness.
病毒是生态系统功能的重要驱动因素,但人们对绝大多数病毒知之甚少。病毒鸟枪法宏基因组学使人们能够在噬菌体群落中研究广泛的生态问题。一个生态特征是物种丰富度,即群落中不同物种的数量。病毒没有类似于细菌 16S rRNA 基因的系统发育标记来估计丰富度,因此使用重叠序列(contigs)的图谱来测量给定群落中病毒分类群的数量。通过将病毒鸟枪法宏基因组中的随机序列读取组装成重叠的序列组(contigs),并计算每个 contig 内分组的序列数,生成 contig 图谱。目前,用于分析 contig 图谱以估计噬菌体丰富度的工具受到依赖等级丰度数据的限制。
我们提出了基于 contig 图谱估计病毒丰富度的统计估计方法。使用 CatchAll(http://www.northeastern.edu/catchall/)程序根据频率计数数据而不是等级丰度来分析 contig 图谱,从而实现了正式的统计分析。此外,通过两种方法(经验和统计)最小化了潜在虚假低频计数对丰富度估计的影响。结果表明,在几乎所有分析的环境中,包括猪粪便和再生淡水,与以前的计算相比,病毒丰富度的估计值更高。
CatchAll 在来自相同或相似环境的病毒宏基因组中的一致估计值。此外,通过混合 contig 图谱分析来自不同环境的 pooled 病毒宏基因组导致的丰富度估计值高于组成宏基因组的估计值。使用 CatchAll 分析 contig 图谱将通过提供丰富度的统计度量,改善病毒鸟枪法宏基因组中丰富度的估计,特别是在大型数据集的情况下。