Brody Thomas, Yavatkar Amarendra S, Park Dong Sun, Kuzin Alexander, Ross Jermaine, Odenwald Ward F
Neural Cell-Fate Determinants Section, NINDS, NIH, Bethesda, Maryland, United States of America.
Division of Intramural Research Information Technology Program, NINDS, NIH, Bethesda, Maryland, United States of America.
PLoS Negl Trop Dis. 2017 Jun 16;11(6):e0005673. doi: 10.1371/journal.pntd.0005673. eCollection 2017 Jun.
Flavivirus and Filovirus infections are serious epidemic threats to human populations. Multi-genome comparative analysis of these evolving pathogens affords a view of their essential, conserved sequence elements as well as progressive evolutionary changes. While phylogenetic analysis has yielded important insights, the growing number of available genomic sequences makes comparisons between hundreds of viral strains challenging. We report here a new approach for the comparative analysis of these hemorrhagic fever viruses that can superimpose an unlimited number of one-on-one alignments to identify important features within genomes of interest.
METHODOLOGY/PRINCIPAL FINDING: We have adapted EvoPrinter alignment algorithms for the rapid comparative analysis of Flavivirus or Filovirus sequences including Zika and Ebola strains. The user can input a full genome or partial viral sequence and then view either individual comparisons or generate color-coded readouts that superimpose hundreds of one-on-one alignments to identify unique or shared identity SNPs that reveal ancestral relationships between strains. The user can also opt to select a database genome in order to access a library of pre-aligned genomes of either 1,094 Flaviviruses or 460 Filoviruses for rapid comparative analysis with all database entries or a select subset. Using EvoPrinter search and alignment programs, we show the following: 1) superimposing alignment data from many related strains identifies lineage identity SNPs, which enable the assessment of sublineage complexity within viral outbreaks; 2) whole-genome SNP profile screens uncover novel Dengue2 and Zika recombinant strains and their parental lineages; 3) differential SNP profiling identifies host cell A-to-I hyper-editing within Ebola and Marburg viruses, and 4) hundreds of superimposed one-on-one Ebola genome alignments highlight ultra-conserved regulatory sequences, invariant amino acid codons and evolutionarily variable protein-encoding domains within a single genome.
CONCLUSIONS/SIGNIFICANCE: EvoPrinter allows for the assessment of lineage complexity within Flavivirus or Filovirus outbreaks, identification of recombinant strains, highlights sequences that have undergone host cell A-to-I editing, and identifies unique input and database SNPs within highly conserved sequences. EvoPrinter's ability to superimpose alignment data from hundreds of strains onto a single genome has allowed us to identify unique Zika virus sublineages that are currently spreading in South, Central and North America, the Caribbean, and in China. This new set of integrated alignment programs should serve as a useful addition to existing tools for the comparative analysis of these viruses.
黄病毒和丝状病毒感染对人类群体构成严重的流行威胁。对这些不断进化的病原体进行多基因组比较分析,能让我们了解其基本的、保守的序列元件以及渐进的进化变化。虽然系统发育分析已经产生了重要的见解,但可用基因组序列数量的不断增加使得对数百种病毒株进行比较具有挑战性。我们在此报告一种用于这些出血热病毒比较分析的新方法,该方法可以叠加无限数量的一对一比对,以识别感兴趣基因组中的重要特征。
方法/主要发现:我们采用了EvoPrinter比对算法,用于快速比较黄病毒或丝状病毒序列,包括寨卡病毒和埃博拉病毒株。用户可以输入完整基因组或部分病毒序列,然后查看单个比较结果,或生成彩色编码的读数,这些读数叠加了数百个一对一比对,以识别独特或共享的单核苷酸多态性(SNP),从而揭示病毒株之间的祖先关系。用户还可以选择一个数据库基因组,以便访问包含1094种黄病毒或460种丝状病毒预比对基因组的文库,用于与所有数据库条目或选定子集进行快速比较分析。使用EvoPrinter搜索和比对程序,我们展示了以下内容:1)叠加来自许多相关病毒株的比对数据可识别谱系特异性SNP,这有助于评估病毒爆发中的亚谱系复杂性;2)全基因组SNP图谱筛选揭示了新型登革热2型和寨卡重组病毒株及其亲本谱系;3)差异SNP分析识别出埃博拉病毒和马尔堡病毒内宿主细胞A到I的超编辑现象,以及4)数百个叠加的一对一埃博拉病毒基因组比对突出了单个基因组内超保守的调控序列、不变的氨基酸密码子和进化上可变的蛋白质编码结构域。
结论/意义:EvoPrinter允许评估黄病毒或丝状病毒爆发中的谱系复杂性,识别重组病毒株,突出显示经历宿主细胞A到I编辑的序列,并在高度保守的序列中识别独特的输入和数据库SNP。EvoPrinter将数百个病毒株的比对数据叠加到单个基因组上的能力,使我们能够识别目前正在南美洲、中美洲、北美洲、加勒比地区和中国传播的独特寨卡病毒亚谱系。这套新的综合比对程序应成为现有病毒比较分析工具的有益补充。