Joint Genome Institute, Department of Energy, Walnut Creek, California, USA.
Nat Protoc. 2017 Aug;12(8):1673-1682. doi: 10.1038/nprot.2017.063. Epub 2017 Jul 27.
The analysis of large microbiome data sets holds great promise for the delineation of the biological and metabolic functioning of living organisms and their role in the environment. In the midst of this genomic puzzle, viruses, especially those that infect microbial communities, represent a major reservoir of genetic diversity with great impact on biogeochemical cycles and organismal health. Overcoming the limitations associated with virus detection directly from microbiomes can provide key insights into how ecosystem dynamics are modulated. Here, we present a computational protocol for accurate detection and grouping of viral sequences from microbiome samples. Our approach relies on an expanded and curated set of viral protein families used as bait to identify viral sequences directly from metagenomic assemblies. This protocol describes how to use the viral protein families catalog (∼7 h) and recommended filters for the detection of viral contigs in metagenomic samples (∼6 h), and it describes the specific parameters for a nucleotide-sequence-identity-based method of organizing the viral sequences into quasi-species taxonomic-level groups (∼10 min).
对大型微生物组数据集的分析为描绘生物体的生物学和代谢功能及其在环境中的作用提供了巨大的前景。在这个基因组的谜团中,病毒,特别是那些感染微生物群落的病毒,是遗传多样性的主要储存库,对生物地球化学循环和生物体健康有着巨大的影响。克服直接从微生物组中检测病毒的局限性,可以深入了解生态系统动态是如何被调节的。在这里,我们提出了一种从微生物组样本中准确检测和分组病毒序列的计算方案。我们的方法依赖于一个扩展和精心整理的病毒蛋白家族集,用作诱饵来直接从宏基因组组装中识别病毒序列。本方案描述了如何使用病毒蛋白家族目录(约 7 小时)和推荐的过滤器来检测宏基因组样本中的病毒连续体(约 6 小时),并描述了基于核苷酸序列同一性的方法将病毒序列组织成准种分类水平群的具体参数(约 10 分钟)。