Aziz Ramy K, Dwivedi Bhakti, Akhter Sajia, Breitbart Mya, Edwards Robert A
Department of Computer Science, San Diego State University San Diego, CA, USA ; Department of Microbiology and Immunology, Faculty of Pharmacy, Cairo University Cairo, Egypt ; Computing, Environment, and Life Sciences, Argonne National Laboratory Argonne, IL, USA.
College of Marine Science, University of South Florida St. Petersburg St. Petersburg, FL, USA.
Front Microbiol. 2015 May 8;6:381. doi: 10.3389/fmicb.2015.00381. eCollection 2015.
Phages are the most abundant biological entities on Earth and play major ecological roles, yet the current sequenced phage genomes do not adequately represent their diversity, and little is known about the abundance and distribution of these sequenced genomes in nature. Although the study of phage ecology has benefited tremendously from the emergence of metagenomic sequencing, a systematic survey of phage genes and genomes in various ecosystems is still lacking, and fundamental questions about phage biology, lifestyle, and ecology remain unanswered. To address these questions and improve comparative analysis of phages in different metagenomes, we screened a core set of publicly available metagenomic samples for sequences related to completely sequenced phages using the web tool, Phage Eco-Locator. We then adopted and deployed an array of mathematical and statistical metrics for a multidimensional estimation of the abundance and distribution of phage genes and genomes in various ecosystems. Experiments using those metrics individually showed their usefulness in emphasizing the pervasive, yet uneven, distribution of known phage sequences in environmental metagenomes. Using these metrics in combination allowed us to resolve phage genomes into clusters that correlated with their genotypes and taxonomic classes as well as their ecological properties. We propose adding this set of metrics to current metaviromic analysis pipelines, where they can provide insight regarding phage mosaicism, habitat specificity, and evolution.
噬菌体是地球上数量最为丰富的生物实体,发挥着重要的生态作用。然而,目前已测序的噬菌体基因组并不能充分体现其多样性,而且对于这些已测序基因组在自然界中的丰度和分布情况,人们了解甚少。尽管宏基因组测序技术的出现极大地推动了噬菌体生态学的研究,但目前仍缺乏对各种生态系统中噬菌体基因和基因组的系统性调查,有关噬菌体生物学、生活方式和生态学的基本问题依旧悬而未决。为了解决这些问题并改进对不同宏基因组中噬菌体的比较分析,我们使用网络工具“噬菌体生态定位器”(Phage Eco-Locator),在一组公开的宏基因组样本中筛选与已完全测序的噬菌体相关的序列。随后,我们采用并运用了一系列数学和统计指标,对各种生态系统中噬菌体基因和基因组的丰度与分布进行多维估计。单独使用这些指标进行的实验表明,它们有助于突出已知噬菌体序列在环境宏基因组中普遍存在但分布不均的情况。综合使用这些指标能使我们将噬菌体基因组解析为与它们的基因型、分类类别以及生态特性相关的簇。我们建议将这组指标添加到当前的病毒宏基因组分析流程中,它们能够为噬菌体的镶嵌性、栖息地特异性和进化提供见解。