Chung Cheng-Han, Walter Michael H, Yang Luobin, Chen Shu-Chuan Grace, Winston Vern, Thomas Michael A
Department of Biological Sciences, Idaho State University, 921 South 8th Avenue, Pocatello, ID, 83209-8007, USA.
Department of Biology, University of Northern Iowa, 144 McCollum Science Hall, Cedar Falls, IA, 50614-0421, USA.
BMC Genomics. 2017 May 4;18(1):350. doi: 10.1186/s12864-017-3744-0.
Most tailed bacteriophages (phages) feature linear dsDNA genomes. Characterizing novel phages requires an understanding of complete genome sequences, including the definition of genome physical ends.
We sequenced 48 Bacillus cereus phage isolates and analyzed Next-generation sequencing (NGS) data to resolve the genome configuration of these novel phages. Most assembled contigs featured reads that mapped to both contig ends and formed circularized contigs. Independent assemblies of 31 nearly identical I48-like Bacillus phage isolates allowed us to observe that the assembly programs tended to produce random cleavage on circularized contigs. However, currently available assemblers were not capable of reporting the underlying phage genome configuration from sequence data. To identify the genome configuration of sequenced phage in silico, a terminus prediction method was developed by means of 'neighboring coverage ratios' and 'read edge frequencies' from read alignment files. Termini were confirmed by primer walking and supported by phylogenetic inference of large DNA terminase protein sequences.
The Terminus package using phage NGS data along with the contig circularity could efficiently identify the proximal positions of phage genome terminus. Complete phage genome sequences allow a proposed characterization of the potential packaging mechanisms and more precise genome annotation.
大多数有尾噬菌体的基因组为线性双链DNA。对新型噬菌体进行特征描述需要了解完整的基因组序列,包括基因组物理末端的定义。
我们对48株蜡样芽孢杆菌噬菌体分离株进行了测序,并分析了二代测序(NGS)数据,以解析这些新型噬菌体的基因组结构。大多数组装的重叠群都有能映射到重叠群两端的 reads,并形成了环状重叠群。对31株几乎相同的I48样芽孢杆菌噬菌体分离株进行独立组装,使我们观察到组装程序倾向于在环状重叠群上产生随机切割。然而,目前可用的组装程序无法从序列数据中报告潜在的噬菌体基因组结构。为了在计算机上识别已测序噬菌体的基因组结构,我们通过读取比对文件中的“相邻覆盖率”和“读取边缘频率”开发了一种末端预测方法。通过引物步移确认末端,并通过大型DNA末端酶蛋白序列的系统发育推断得到支持。
使用噬菌体NGS数据和重叠群环化的Terminus软件包可以有效地识别噬菌体基因组末端的近端位置。完整的噬菌体基因组序列有助于对潜在的包装机制进行特征描述,并实现更精确的基因组注释。