Annu Int Conf IEEE Eng Med Biol Soc. 2021 Nov;2021:1682-1685. doi: 10.1109/EMBC46164.2021.9630673.
The Influenza virus can be considered as one of the most severe viruses that can infect multiple species with often fatal consequences to the hosts. The Hemagglutinin (HA) gene of the virus can be a target for antiviral drug development realised through accurate identification of its sub-types and possible the targeted hosts. This paper focuses on accurately predicting if an Influenza type A virus can infect specific hosts, and more specifically, Human, Avian and Swine hosts, using only the protein sequence of the HA gene. In more detail, we propose encoding the protein sequences into numerical signals using the Hydrophobicity Index and subsequently utilising a Convolutional Neural Network-based predictive model. The Influenza HA protein sequences used in the proposed work are obtained from the Influenza Research Database (IRD). Specifically, complete and unique HA protein sequences were used for avian, human and swine hosts. The data obtained for this work was 17999 human-host proteins, 17667 avian-host proteins and 9278 swine-host proteins. Given this set of collected proteins, the proposed method yields as much as 10% higher accuracy for an individual class (namely, Avian) and 5% higher overall accuracy than in an earlier study. It is also observed that the accuracy for each class in this work is more balanced than what was presented in this earlier study. As the results show, the proposed model can distinguish HA protein sequences with high accuracy whenever the virus under investigation can infect Human, Avian or Swine hosts.
流感病毒可以被认为是能够感染多种物种的最严重病毒之一,对宿主通常具有致命后果。病毒的血凝素 (HA) 基因可以成为抗病毒药物开发的目标,通过准确识别其亚型和可能的靶向宿主来实现。本文专注于仅使用 HA 基因的蛋白质序列准确预测甲型流感病毒是否可以感染特定宿主,更具体地说,是否可以感染人类、禽类和猪宿主。更详细地说,我们提出使用疏水性指数将蛋白质序列编码为数字信号,然后利用基于卷积神经网络的预测模型。本研究中使用的流感 HA 蛋白质序列是从流感研究数据库 (IRD) 中获得的。具体来说,使用了完整且独特的禽类、人类和猪宿主的 HA 蛋白质序列。这项工作获得的数据是 17999 个人类宿主蛋白、17667 个禽类宿主蛋白和 9278 个猪宿主蛋白。考虑到这组收集的蛋白质,与早期研究相比,所提出的方法在单个类别(即禽类)中产生了高达 10%的更高准确性,并且总体准确性提高了 5%。还观察到,与早期研究相比,本工作中每个类别的准确性更加平衡。结果表明,无论研究中的病毒是否可以感染人类、禽类或猪宿主,所提出的模型都可以高度准确地区分 HA 蛋白质序列。