Computational and Mathematical Biology, Genome Institute of Singapore, Singapore.
Nucleic Acids Res. 2011 Mar;39(6):e34. doi: 10.1093/nar/gkq1232. Epub 2010 Dec 21.
Reassortments in the influenza virus--a process where strains exchange genetic segments--have been implicated in two out of three pandemics of the 20th century as well as the 2009 H1N1 outbreak. While advances in sequencing have led to an explosion in the number of whole-genome sequences that are available, an understanding of the rate and distribution of reassortments and their role in viral evolution is still lacking. An important factor in this is the paucity of automated tools for confident identification of reassortments from sequence data due to the challenges of analyzing large, uncertain viral phylogenies. We describe here a novel computational method, called GiRaF (Graph-incompatibility-based Reassortment Finder), that robustly identifies reassortments in a fully automated fashion while accounting for uncertainties in the inferred phylogenies. The algorithms behind GiRaF search large collections of Markov chain Monte Carlo (MCMC)-sampled trees for groups of incompatible splits using a fast biclique enumeration algorithm coupled with several statistical tests to identify sets of taxa with differential phylogenetic placement. GiRaF correctly finds known reassortments in human, avian, and swine influenza populations, including the evolutionary events that led to the recent 'swine flu' outbreak. GiRaF also identifies several previously unreported reassortments via whole-genome studies to catalog events in H5N1 and swine influenza isolates.
流感病毒的重配——一种菌株交换遗传片段的过程——与 20 世纪的三次大流行以及 2009 年的 H1N1 爆发都有关联。尽管测序技术的进步使得可用的全基因组序列数量呈爆炸式增长,但对重配的速度、分布及其在病毒进化中的作用的理解仍存在不足。由于分析大型、不确定的病毒系统发育存在挑战,缺乏用于从序列数据中自信地识别重配的自动化工具是一个重要因素。我们在这里描述了一种新的计算方法,称为 GiRaF(基于图不相容的重配发现),它可以在自动且完全自动化的方式下稳健地识别重配,同时考虑到推断系统发育中存在的不确定性。GiRaF 的算法使用快速二部图枚举算法和几个统计检验来搜索大型马尔可夫链蒙特卡罗 (MCMC) 抽样树集合中不兼容的分裂组,以识别具有不同系统发育位置的分类群集。GiRaF 可以正确地在人类、禽和猪流感群体中找到已知的重配,包括导致最近“猪流感”爆发的进化事件。通过全基因组研究,GiRaF 还识别了几个以前未报道的重配,以记录 H5N1 和猪流感分离物中的事件。