Chrysostomou Charalambos, Partaourides Harris, Seker Huseyin
Annu Int Conf IEEE Eng Med Biol Soc. 2017 Jul;2017:1186-1189. doi: 10.1109/EMBC.2017.8037042.
The Influenza type A virus can be considered as one of the most severe viruses that can infect multiple species with often fatal consequences to the hosts. The Haemagglutinin (HA) gene of the virus has the potential to be a target for antiviral drug development realised through accurate identification of its sub-types and possible the targeted hosts. In this paper, to accurately predict if an Influenza type A virus has the capability to infect human hosts, by using only the HA gene, is therefore developed and tested. The predictive model follows three main steps; (i) decoding the protein sequences into numerical signals using EIIP amino acid scale, (ii) analysing these sequences by using Discrete Fourier Transform (DFT) and extracting DFT-based features, (iii) using a predictive model, based on Artificial Neural Networks and using the features generated by DFT. In this analysis, from the Influenza Research Database, 30724, 18236 and 8157 HA protein sequences were collected for Human, Avian and Swine respectively. Given this set of the proteins, the proposed method yielded 97.36% (± 0.04%), 97.26% (± 0.26%), 0.978 (± 0.004), 0.963 (± 0.005) and 0.945 (±0.005) for the training accuracy validation accuracy, precision, recall and Mathews Correlation Coefficient (MCC) respectively, based on a 10-fold cross-validation. The classification model generated by using one of the largest dataset, if not the largest, yields promising results that could lead to early detection of such species and help develop precautionary measurements for possible human infections.
甲型流感病毒可被视为最严重的病毒之一,它能够感染多种物种,常常给宿主带来致命后果。该病毒的血凝素(HA)基因有潜力成为抗病毒药物研发的靶点,这可通过准确识别其亚型以及可能的目标宿主来实现。因此,本文开发并测试了一种仅利用HA基因来准确预测甲型流感病毒是否具有感染人类宿主能力的方法。该预测模型遵循三个主要步骤:(i)使用电子离子相互作用势(EIIP)氨基酸标度将蛋白质序列解码为数字信号;(ii)使用离散傅里叶变换(DFT)分析这些序列并提取基于DFT的特征;(iii)使用基于人工神经网络并利用DFT生成的特征的预测模型。在本分析中,从流感研究数据库分别收集了30724条、18236条和8157条人类、禽类和猪的HA蛋白序列。基于这组蛋白质,在10折交叉验证的基础上,所提出的方法分别在训练准确率、验证准确率、精确率、召回率和马修斯相关系数(MCC)方面取得了97.36%(±0.04%)、97.26%(±0.26%)、0.978(±0.004)、0.963(±0.005)和0.945(±0.005)的结果。使用最大数据集之一(即便不是最大的)生成的分类模型产生了有前景的结果,这可能会导致对这类物种的早期检测,并有助于制定针对可能的人类感染的预防措施。