Suppr超能文献

基于一致性的特征选择方法与线性 SVM 联合用于 HIV-1 蛋白酶切割位点预测。

A consistency-based feature selection method allied with linear SVMs for HIV-1 protease cleavage site prediction.

机构信息

eSNAg Research Group, Department of Computer Engineering, TOBB University, Ankara, Turkey ; Raccoon Software Computer R&D Ltd., Ankara, Turkey.

出版信息

PLoS One. 2013 Aug 23;8(8):e63145. doi: 10.1371/journal.pone.0063145. eCollection 2013.

Abstract

BACKGROUND

Predicting type-1 Human Immunodeficiency Virus (HIV-1) protease cleavage site in protein molecules and determining its specificity is an important task which has attracted considerable attention in the research community. Achievements in this area are expected to result in effective drug design (especially for HIV-1 protease inhibitors) against this life-threatening virus. However, some drawbacks (like the shortage of the available training data and the high dimensionality of the feature space) turn this task into a difficult classification problem. Thus, various machine learning techniques, and specifically several classification methods have been proposed in order to increase the accuracy of the classification model. In addition, for several classification problems, which are characterized by having few samples and many features, selecting the most relevant features is a major factor for increasing classification accuracy.

RESULTS

We propose for HIV-1 data a consistency-based feature selection approach in conjunction with recursive feature elimination of support vector machines (SVMs). We used various classifiers for evaluating the results obtained from the feature selection process. We further demonstrated the effectiveness of our proposed method by comparing it with a state-of-the-art feature selection method applied on HIV-1 data, and we evaluated the reported results based on attributes which have been selected from different combinations.

CONCLUSION

Applying feature selection on training data before realizing the classification task seems to be a reasonable data-mining process when working with types of data similar to HIV-1. On HIV-1 data, some feature selection or extraction operations in conjunction with different classifiers have been tested and noteworthy outcomes have been reported. These facts motivate for the work presented in this paper.

SOFTWARE AVAILABILITY

The software is available at http://ozyer.etu.edu.tr/c-fs-svm.rar. The software can be downloaded at esnag.etu.edu.tr/software/hiv_cleavage_site_prediction.rar; you will find a readme file which explains how to set the software in order to work.

摘要

背景

预测蛋白质分子中 1 型人类免疫缺陷病毒 (HIV-1) 蛋白酶切割位点并确定其特异性是一项重要任务,这在研究界引起了相当大的关注。在这一领域的成就有望导致针对这种危及生命的病毒的有效药物设计(特别是针对 HIV-1 蛋白酶抑制剂)。然而,一些缺点(例如可用训练数据的短缺和特征空间的高维性)使这项任务变成了一个困难的分类问题。因此,为了提高分类模型的准确性,已经提出了各种机器学习技术,特别是几种分类方法。此外,对于具有少数样本和许多特征的几个分类问题,选择最相关的特征是提高分类准确性的主要因素。

结果

我们提出了一种基于一致性的特征选择方法,结合支持向量机 (SVM) 的递归特征消除,用于 HIV-1 数据。我们使用了各种分类器来评估特征选择过程中获得的结果。我们通过将我们的方法与应用于 HIV-1 数据的最新特征选择方法进行比较,进一步证明了我们方法的有效性,并且我们根据从不同组合中选择的属性来评估报告的结果。

结论

在进行分类任务之前,对训练数据进行特征选择似乎是处理类似于 HIV-1 的类型数据的合理数据挖掘过程。在 HIV-1 数据上,已经测试了一些特征选择或提取操作与不同分类器结合的操作,并报告了值得注意的结果。这些事实为本文提出的工作提供了动力。

软件可用性

软件可在 http://ozyer.etu.edu.tr/c-fs-svm.rar 处获得。软件可在 esnag.etu.edu.tr/software/hiv_cleavage_site_prediction.rar 处下载;您将找到一个说明文件,解释如何设置软件以使其正常工作。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/55dd/3751940/a5cf86b945f1/pone.0063145.g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验