Department of Basic Sciences, College of Veterinary Medicine, Mississippi State University, Mississippi State, MS, USA.
Center for Biomedical Mass Spectrometry, Department of Biochemistry, Boston University School of Medicine, Boston, MA, USA.
BMC Bioinformatics. 2020 May 11;21(1):182. doi: 10.1186/s12859-020-3527-5.
In addition to causing the pandemic influenza outbreaks of 1918 and 2009, subtype H1N1 influenza A viruses (IAVs) have caused seasonal epidemics since 1977. Antigenic property of influenza viruses are determined by both protein sequence and N-linked glycosylation of influenza glycoproteins, especially hemagglutinin (HA). The currently available computational methods are only considered features in protein sequence but not N-linked glycosylation.
A multi-task learning sparse group least absolute shrinkage and selection operator (LASSO) (MTL-SGL) regression method was developed and applied to derive two types of predominant features including protein sequence and N-linked glycosylation in hemagglutinin (HA) affecting variations in serologic data for human and swine H1N1 IAVs. Results suggested that mutations and changes in N-linked glycosylation sites are associated with the rise of antigenic variants of H1N1 IAVs. Furthermore, the implicated mutations are predominantly located at five reported antibody-binding sites, and within or close to the HA receptor binding site. All of the three N-linked glycosylation sites (i.e. sequons NCSV at HA 54, NHTV at HA 125, and NLSK at HA 160) identified by MTL-SGL to determine antigenic changes were experimentally validated in the H1N1 antigenic variants using mass spectrometry analyses. Compared with conventional sparse learning methods, MTL-SGL achieved a lower prediction error and higher accuracy, indicating that grouped features and MTL in the MTL-SGL method are not only able to handle serologic data generated from multiple reagents, supplies, and protocols, but also perform better in genetic sequence-based antigenic quantification.
In summary, the results of this study suggest that mutations and variations in N-glycosylation in HA caused antigenic variations in H1N1 IAVs and that the sequence-based antigenicity predictive model will be useful in understanding antigenic evolution of IAVs.
除了引发 1918 年和 2009 年的大流感疫情外,H1N1 亚型甲型流感病毒(IAV)自 1977 年以来一直引发季节性流行。流感病毒的抗原特性取决于蛋白质序列和流感糖蛋白的 N-连接糖基化,尤其是血凝素(HA)。目前可用的计算方法仅考虑蛋白质序列特征,而不考虑 N-连接糖基化。
开发了一种多任务学习稀疏组最小绝对收缩和选择算子(MTL-SGL)回归方法,并将其应用于推导两种主要特征,包括影响人类和猪 H1N1 IAV 血清学数据变化的血凝素(HA)中的蛋白质序列和 N-连接糖基化。结果表明,突变和 N-连接糖基化位点的变化与 H1N1 IAV 抗原变异体的出现有关。此外,所涉及的突变主要位于五个已报道的抗体结合位点内或附近,并且位于 HA 受体结合位点内或附近。通过 MTL-SGL 确定的决定抗原变化的三个 N-连接糖基化位点(即 HA 54 处的 NCSV、HA 125 处的 NHTV 和 HA 160 处的 NLSK)均通过质谱分析在 H1N1 抗原变异体中得到了实验验证。与传统的稀疏学习方法相比,MTL-SGL 实现了更低的预测误差和更高的准确性,这表明 MTL-SGL 方法中的分组特征和 MTL 不仅能够处理来自多种试剂、耗材和方案的血清学数据,而且在基于遗传序列的抗原定量方面表现更好。
综上所述,本研究结果表明,HA 中的突变和 N-糖基化变异导致了 H1N1 IAV 的抗原变异,基于序列的抗原预测模型将有助于理解 IAV 的抗原进化。