Department of Computer Science, EPFL (Swiss Federal Institute of Technology), Lausanne, CH-1015, Switzerland.
BMC Genomics. 2011;12 Suppl 2(Suppl 2):S3. doi: 10.1186/1471-2164-12-S2-S3. Epub 2011 Jul 27.
Reassortments are events in the evolution of the genome of influenza (flu), whereby segments of the genome are exchanged between different strains. As reassortments have been implicated in major human pandemics of the last century, their identification has become a health priority. While such identification can be done "by hand" on a small dataset, researchers and health authorities are building up enormous databases of genomic sequences for every flu strain, so that it is imperative to develop automated identification methods. However, current methods are limited to pairwise segment comparisons.
We present FluReF, a fully automated flu virus reassortment finder. FluReF is inspired by the visual approach to reassortment identification and uses the reconstructed phylogenetic trees of the individual segments and of the full genome. We also present a simple flu evolution simulator, based on the current, source-sink, hypothesis for flu cycles. On synthetic datasets produced by our simulator, FluReF, tuned for a 0% false positive rate, yielded false negative rates of less than 10%. FluReF corroborated two new reassortments identified by visual analysis of 75 Human H3N2 New York flu strains from 2005-2008 and gave partial verification of reassortments found using another bioinformatics method.
FluReF finds reassortments by a bottom-up search of the full-genome and segment-based phylogenetic trees for candidate clades--groups of one or more sampled viruses that are separated from the other variants from the same season. Candidate clades in each tree are tested to guarantee confidence values, using the lengths of key edges as well as other tree parameters; clades with reassortments must have validated incongruencies among segment trees.
FluReF demonstrates robustness of prediction for geographically and temporally expanded datasets, and is not limited to finding reassortments with previously collected sequences. The complete source code is available from http://lcbb.epfl.ch/software.html.
重配是流感(流感)基因组进化过程中的事件,在此过程中,基因组的片段在不同菌株之间交换。由于重配事件与上个世纪的重大人类大流行有关,因此确定重配事件已成为健康的优先事项。虽然可以在小数据集上“手动”进行此类识别,但研究人员和卫生当局正在为每个流感毒株建立庞大的基因组序列数据库,因此必须开发自动化识别方法。然而,目前的方法仅限于两两片段比较。
我们提出了 FluReF,这是一种全自动流感病毒重配发现器。FluReF 的灵感来自于对重配识别的视觉方法,并使用了个体片段和整个基因组的重建系统发育树。我们还介绍了一种简单的流感进化模拟器,该模拟器基于当前的流感周期源-汇假说。在我们的模拟器生成的合成数据集上,经过调整以实现 0%假阳性率的 FluReF 的假阴性率不到 10%。FluReF 证实了通过对 2005-2008 年来自纽约的 75 株人类 H3N2 流感病毒进行视觉分析确定的两个新重配事件,并对使用另一种生物信息学方法发现的重配事件进行了部分验证。
FluReF 通过对全基因组和基于片段的系统发育树进行自底向上搜索来发现候选聚类群——从同一季节的其他变体中分离出来的一个或多个采样病毒的组。使用关键边缘的长度以及其他树参数来测试每个树中的候选聚类群,以确保置信度值;具有重配事件的聚类群必须在片段树之间具有验证不一致性。
FluReF 证明了对地理和时间扩展数据集的预测稳健性,并且不限于发现与以前收集的序列的重配事件。完整的源代码可从 http://lcbb.epfl.ch/software.html 获得。