School of Molecular Bioscience G-08, The University of Sydney, Sydney, NSW, 2006, Australia.
BMC Bioinformatics. 2012 Aug 20;13:208. doi: 10.1186/1471-2105-13-208.
Influenza is one of the oldest and deadliest infectious diseases known to man. Reassorted strains of the virus pose the greatest risk to both human and animal health and have been associated with all pandemics of the past century, with the possible exception of the 1918 pandemic, resulting in tens of millions of deaths. We have developed and tested new computer algorithms, FluShuffle and FluResort, which enable reassorted viruses to be identified by the most rapid and direct means possible. These algorithms enable reassorted influenza, and other, viruses to be rapidly identified to allow prevention strategies and treatments to be more efficiently implemented.
The FluShuffle and FluResort algorithms were tested with both experimental and simulated mass spectra of whole virus digests. FluShuffle considers different combinations of viral protein identities that match the mass spectral data using a Gibbs sampling algorithm employing a mixed protein Markov chain Monte Carlo (MCMC) method. FluResort utilizes those identities to calculate the weighted distance of each across two or more different phylogenetic trees constructed through viral protein sequence alignments. Each weighted mean distance value is normalized by conversion to a Z-score to establish a reassorted strain.
The new FluShuffle and FluResort algorithms can correctly identify the origins of influenza viral proteins and the number of reassortment events required to produce the strains from the high resolution mass spectral data of whole virus proteolytic digestions. This has been demonstrated in the case of constructed vaccine strains as well as common human seasonal strains of the virus. The algorithms significantly improve the capability of the proteotyping approach to identify reassorted viruses that pose the greatest pandemic risk.
流感是已知对人类最古老、最致命的传染病之一。病毒的重组株对人类和动物的健康构成最大威胁,并且与过去一个世纪的所有大流行有关,过去的大流行可能除了 1918 年的大流行之外,导致数千万人死亡。我们开发并测试了新的计算机算法 FluShuffle 和 FluResort,这些算法可以通过最快、最直接的方式识别重组病毒。这些算法可以快速识别重组流感病毒和其他病毒,以便更有效地实施预防策略和治疗方法。
FluShuffle 和 FluResort 算法已通过整个病毒消化物的实验和模拟质谱进行了测试。FluShuffle 使用 Gibbs 采样算法考虑与质谱数据匹配的不同病毒蛋白身份组合,该算法使用混合蛋白马尔可夫链蒙特卡罗 (MCMC) 方法。FluResort 利用这些身份通过病毒蛋白序列比对构建的两个或更多不同系统发育树来计算每个身份的加权距离。每个加权平均距离值通过转换为 Z 分数进行归一化,以建立重组株。
新的 FluShuffle 和 FluResort 算法可以正确识别流感病毒蛋白的起源以及从整个病毒蛋白水解消化的高分辨率质谱数据产生这些菌株所需的重组事件数。这在构建的疫苗株以及常见的人类季节性病毒株中得到了证明。这些算法显著提高了蛋白组学方法识别构成最大大流行风险的重组病毒的能力。