Department of Mathematics.
College of Life Sciences, Zhejiang Sci-Tech University, Hangzhou 310018, China.
Bioinformatics. 2017 Oct 15;33(20):3195-3201. doi: 10.1093/bioinformatics/btx390.
Low-rank matrix completion has been demonstrated to be powerful in predicting antigenic distances among influenza viruses and vaccines from partially revealed hemagglutination inhibition table. Meanwhile, influenza hemagglutinin (HA) protein sequences are also effective in inferring antigenic distances. Thus, it is natural to integrate HA protein sequence information into low-rank matrix completion model to help infer influenza antigenicity, which is critical to influenza vaccine development.
We have proposed a novel algorithm called biological matrix completion with side information (BMCSI), which first measures HA protein sequence similarities among influenza viruses (especially on epitopes) and then integrates the similarity information into a low-rank matrix completion model to predict influenza antigenicity. This algorithm exploits both the correlations among viruses and vaccines in serological tests and the power of HA sequence in predicting influenza antigenicity. We applied this model into H3N2 seasonal influenza virus data. Comparing to previous methods, we significantly reduced the prediction root-mean-square error in a 10-fold cross validation analysis. Based on the cartographies constructed from imputed data, we showed that the antigenic evolution of H3N2 seasonal influenza is generally S-shaped while the genetic evolution is half-circle shaped. We also showed that the Spearman correlation between genetic and antigenic distances (among antigenic clusters) is 0.83, demonstrating a globally high correspondence and some local discrepancies between influenza genetic and antigenic evolution. Finally, we showed that 4.4%±1.2% genetic variance (corresponding to 3.11 ± 1.08 antigenic distances) caused an antigenic drift event for H3N2 influenza viruses historically.
The software and data for this study are available at http://bi.sky.zstu.edu.cn/BMCSI/.
jialiang.yang@mssm.edu or pinganhe@zstu.edu.cn.
Supplementary data are available at Bioinformatics online.
低秩矩阵补全已被证明在预测流感病毒和疫苗的抗原距离方面非常有效,其数据来自部分揭示的血凝抑制表。同时,流感血凝素(HA)蛋白序列也可有效地推断抗原距离。因此,将 HA 蛋白序列信息整合到低秩矩阵补全模型中以帮助推断流感抗原性是很自然的,这对流感疫苗的开发至关重要。
我们提出了一种称为具有辅助信息的生物矩阵补全(BMCSI)的新算法,该算法首先测量流感病毒之间的 HA 蛋白序列相似性(尤其是在表位上),然后将相似性信息整合到低秩矩阵补全模型中以预测流感抗原性。该算法利用了血清学检测中病毒和疫苗之间的相关性以及 HA 序列预测流感抗原性的能力。我们将该模型应用于 H3N2 季节性流感病毒数据。与以前的方法相比,我们在 10 倍交叉验证分析中显著降低了预测均方根误差。基于从补全数据构建的图谱,我们表明 H3N2 季节性流感的抗原进化通常呈 S 形,而遗传进化呈半圆形。我们还表明,遗传和抗原距离(在抗原簇之间)之间的 Spearman 相关性为 0.83,这表明流感遗传和抗原进化之间具有全球高度一致性和一些局部差异。最后,我们表明,H3N2 流感病毒历史上,4.4%±1.2%的遗传变异(对应 3.11±1.08 个抗原距离)导致了抗原漂移事件。
本研究的软件和数据可在 http://bi.sky.zstu.edu.cn/BMCSI/ 上获得。
jialiang.yang@mssm.edu 或 pinganhe@zstu.edu.cn。
补充数据可在《生物信息学》在线获取。