MOE Key Laboratory for Nonequilibrium Synthesis and Modulation of Condensed Matter, School of Physics, Xi'an Jiaotong University, Xi'an 710049, China.
Viruses. 2022 Feb 25;14(3):469. doi: 10.3390/v14030469.
To date, many experiments have revealed that the functional balance between hemagglutinin (HA) and neuraminidase (NA) plays a crucial role in viral mobility, production, and transmission. However, whether and how HA and NA maintain balance at the sequence level needs further investigation. Here, we applied principal component analysis and hierarchical clustering analysis on thousands of HA and NA sequences of A/H1N1 and A/H3N2. We discovered significant coevolution between HA and NA at the sequence level, which is closely related to the type of host species and virus epidemic years. Furthermore, we propose a sequence-to-sequence transformer model (S2STM), which mainly consists of an encoder and a decoder that adopts a multi-head attention mechanism for establishing the mapping relationship between HA and NA sequences. The training results reveal that the S2STM can effectively realize the "translation" from HA to NA or vice versa, thereby building a relationship network between them. Our work combines unsupervised and supervised machine learning methods to identify the sequence matching between HA and NA, which will advance our understanding of IAVs' evolution and also provide a novel idea for sequence analysis methods.
迄今为止,许多实验表明,血凝素(HA)和神经氨酸酶(NA)之间的功能平衡在病毒的移动性、产生和传播中起着至关重要的作用。然而,HA 和 NA 是否以及如何在序列水平上保持平衡仍需要进一步研究。在这里,我们对数千个 A/H1N1 和 A/H3N2 的 HA 和 NA 序列应用主成分分析和层次聚类分析。我们发现 HA 和 NA 在序列水平上存在显著的协同进化,这与宿主物种的类型和病毒流行年份密切相关。此外,我们提出了一个序列到序列的转换器模型(S2STM),它主要由一个编码器和解码器组成,采用多头注意力机制来建立 HA 和 NA 序列之间的映射关系。训练结果表明,S2STM 可以有效地实现从 HA 到 NA 或反之的“翻译”,从而建立它们之间的关系网络。我们的工作结合了无监督和监督机器学习方法来识别 HA 和 NA 之间的序列匹配,这将推进我们对 IAVs 进化的理解,也为序列分析方法提供了新的思路。