APC Microbiome Ireland, University College Cork, County Cork, Ireland.
School of Microbiology, University College Cork, County Cork, Ireland.
Sci Adv. 2020 Feb 7;6(6):eaay5981. doi: 10.1126/sciadv.aay5981. eCollection 2020 Feb.
The first sequenced genome was that of the 3569-nucleotide single-stranded RNA (ssRNA) bacteriophage MS2. Despite the recent accumulation of vast amounts of DNA and RNA sequence data, only 12 representative ssRNA phage genome sequences are available from the NCBI Genome database (June 2019). The difficulty in detecting RNA phages in metagenomic datasets raises questions as to their abundance, taxonomic structure, and ecological importance. In this study, we iteratively applied profile hidden Markov models to detect conserved ssRNA phage proteins in 82 publicly available metatranscriptomic datasets generated from activated sludge and aquatic environments. We identified 15,611 nonredundant ssRNA phage sequences, including 1015 near-complete genomes. This expansion in the number of known sequences enabled us to complete a phylogenetic assessment of both sequences identified in this study and known ssRNA phage genomes. Our expansion of these viruses from two environments suggests that they have been overlooked within microbiome studies.
第一个被测序的基因组是 3569 个核苷酸的单链 RNA(ssRNA)噬菌体 MS2。尽管最近积累了大量的 DNA 和 RNA 序列数据,但仅从 NCBI 基因组数据库(2019 年 6 月)获得了 12 个有代表性的 ssRNA 噬菌体基因组序列。在宏基因组数据集检测 RNA 噬菌体的困难引发了人们对它们的丰度、分类结构和生态重要性的质疑。在这项研究中,我们迭代应用了轮廓隐马尔可夫模型来检测 82 个来自活性污泥和水生环境的公开可用的宏转录组数据集的保守 ssRNA 噬菌体蛋白。我们鉴定了 15611 个非冗余的 ssRNA 噬菌体序列,包括 1015 个近完整基因组。已知序列数量的增加使我们能够完成本研究中鉴定的序列和已知的 ssRNA 噬菌体基因组的系统发育评估。我们从两个环境中扩展了这些病毒,这表明它们在微生物组研究中被忽视了。