Suppr超能文献

基于信号处理的生物信息学方法用于鉴定神经氨酸酶基因中的甲型流感病毒亚型。

Signal-processing-based bioinformatics approach for the identification of influenza A virus subtypes in neuraminidase genes.

作者信息

Chrysostomou Charalambos, Seker Huseyin

出版信息

Annu Int Conf IEEE Eng Med Biol Soc. 2013;2013:3066-9. doi: 10.1109/EMBC.2013.6610188.

Abstract

Neuraminidase (NA) genes of influenza A virus is a highly potential candidate for antiviral drug development that can only be realized through true identification of its sub-types. In this paper, in order to accurately detect the sub-types, a hybrid predictive model is therefore developed and tested over proteins obtained from the four subtypes of the influenza A virus, namely, H1N1, H2N2, H3N2 and H5N1 that caused major pandemics in the twentieth century. The predictive model is built by the following four main steps; (i) decoding the protein sequences into numerical signals by means of EIIP amino acid scale, (ii) analysing these signals (protein sequences) by using Discrete Fourier Transform (DFT) and extracting DFT-based features, (iii) selecting more influential sub-set of the features by using the F-score statistical feature selection method, and finally (iv) building a predictive model on the feature sub-set by using support vector machine classifier. The protein sequences were chosen as to be of high percentage identity that they demonstrate within individual influenza subtype classes and high variation that they display in the percentage identity. This makes the proteins very difficult to distinguish from each other even they belong to different subtypes. Given this set of the proteins, the predictive model yielded 98.3% accuracy based on a 5-fold cross validation. This also results in a twenty feature sub-set that can also help reveal spectral characteristics of the subtypes. The proposed model is promising and can easily be generalized for other similar studies.

摘要

甲型流感病毒的神经氨酸酶(NA)基因是抗病毒药物开发的一个极具潜力的候选对象,而这只有通过准确识别其亚型才能实现。在本文中,为了准确检测亚型,因此开发了一种混合预测模型,并对从甲型流感病毒的四种亚型(即H1N1、H2N2、H3N2和H5N1,它们在20世纪引发了重大疫情)中获得的蛋白质进行了测试。该预测模型通过以下四个主要步骤构建:(i)借助EIIP氨基酸量表将蛋白质序列解码为数字信号,(ii)使用离散傅里叶变换(DFT)分析这些信号(蛋白质序列)并提取基于DFT的特征,(iii)使用F分数统计特征选择方法选择更具影响力的特征子集,最后(iv)使用支持向量机分类器在特征子集上构建预测模型。所选择的蛋白质序列在各个流感亚型类别中具有很高的百分比同一性,并且在百分比同一性方面表现出很大的差异。这使得即使这些蛋白质属于不同亚型,也很难相互区分。基于这组蛋白质,预测模型在5折交叉验证的基础上产生了98.3%的准确率。这还产生了一个包含20个特征的子集,该子集也有助于揭示亚型的光谱特征。所提出的模型很有前景,并且可以很容易地推广到其他类似研究中。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验