为何神经网络不应被用于预测HIV-1蛋白酶切割位点。

Why neural networks should not be used for HIV-1 protease cleavage site prediction.

作者信息

Rögnvaldsson Thorsteinn, You Liwen

机构信息

Intelligent Systems Laboratory, School of Information Science, Computer and Electrical Engineering, Halmstad University, Box 823, 301 18 Sweden.

出版信息

Bioinformatics. 2004 Jul 22;20(11):1702-9. doi: 10.1093/bioinformatics/bth144. Epub 2004 Feb 26.

DOI:10.1093/bioinformatics/bth144

PMID:14988129

Abstract

UNLABELLED

Several papers have been published where nonlinear machine learning algorithms, e.g. artificial neural networks, support vector machines and decision trees, have been used to model the specificity of the HIV-1 protease and extract specificity rules. We show that the dataset used in these studies is linearly separable and that it is a misuse of nonlinear classifiers to apply them to this problem. The best solution on this dataset is achieved using a linear classifier like the simple perceptron or the linear support vector machine, and it is straightforward to extract rules from these linear models. We identify key residues in peptides that are efficiently cleaved by the HIV-1 protease and list the most prominent rules, relating them to experimental results for the HIV-1 protease.

MOTIVATION

Understanding HIV-1 protease specificity is important when designing HIV inhibitors and several different machine learning algorithms have been applied to the problem. However, little progress has been made in understanding the specificity because nonlinear and overly complex models have been used.

RESULTS

We show that the problem is much easier than what has previously been reported and that linear classifiers like the simple perceptron or linear support vector machines are at least as good predictors as nonlinear algorithms. We also show how sets of specificity rules can be generated from the resulting linear classifiers.

AVAILABILITY

The datasets used are available at http://www.hh.se/staff/bioinf/

摘要

未标注

已有几篇论文发表，其中使用了非线性机器学习算法，例如人工神经网络、支持向量机和决策树，来对HIV-1蛋白酶的特异性进行建模并提取特异性规则。我们表明，这些研究中使用的数据集是线性可分的，将非线性分类器应用于此问题属于滥用。在这个数据集上，使用简单感知器或线性支持向量机等线性分类器可获得最佳解决方案，并且从这些线性模型中提取规则很简单。我们确定了被HIV-1蛋白酶有效切割的肽段中的关键残基，并列出了最突出的规则，将它们与HIV-1蛋白酶的实验结果相关联。

动机

在设计HIV抑制剂时，了解HIV-1蛋白酶的特异性很重要，并且已经将几种不同的机器学习算法应用于该问题。然而，由于使用了非线性和过于复杂的模型，在理解特异性方面进展甚微。

结果

我们表明，该问题比之前报道的要容易得多，并且简单感知器或线性支持向量机等线性分类器至少与非线性算法一样是良好的预测器。我们还展示了如何从所得的线性分类器中生成特异性规则集。

可用性

所使用的数据集可在http://www.hh.se/staff/bioinf/获取。

相似文献

Why neural networks should not be used for HIV-1 protease cleavage site prediction.

Bioinformatics. 2004 Jul 22;20(11):1702-9. doi: 10.1093/bioinformatics/bth144. Epub 2004 Feb 26.

Accurate prediction of HIV-1 drug response from the reverse transcriptase and protease amino acid sequences using sparse models created by convex optimization.

Bioinformatics. 2006 Mar 1;22(5):541-9. doi: 10.1093/bioinformatics/btk011. Epub 2005 Dec 20.

Mining HIV protease cleavage data using genetic programming with a sum-product function.

Bioinformatics. 2004 Dec 12;20(18):3398-405. doi: 10.1093/bioinformatics/bth414. Epub 2004 Jul 15.

Prediction of caspase cleavage sites using Bayesian bio-basis function neural networks.

Bioinformatics. 2005 May 1;21(9):1831-7. doi: 10.1093/bioinformatics/bti281. Epub 2005 Jan 25.

Specificity rule discovery in HIV-1 protease cleavage site analysis.

Comput Biol Chem. 2008 Feb;32(1):71-8. doi: 10.1016/j.compbiolchem.2007.09.006. Epub 2007 Sep 29.

Predicting hepatitis C virus protease cleavage sites using generalized linear indicator regression models.

IEEE Trans Biomed Eng. 2006 Oct;53(10):2119-23. doi: 10.1109/TBME.2006.881779.

Predicting disulfide connectivity from protein sequence using multiple sequence feature vectors and secondary structure.

Bioinformatics. 2007 Dec 1;23(23):3147-54. doi: 10.1093/bioinformatics/btm505. Epub 2007 Oct 17.

Variable context Markov chains for HIV protease cleavage site prediction.

Biosystems. 2009 Jun;96(3):246-50. doi: 10.1016/j.biosystems.2009.03.001. Epub 2009 Mar 24.

Ensemble classifier for protein fold pattern recognition.

Bioinformatics. 2006 Jul 15;22(14):1717-22. doi: 10.1093/bioinformatics/btl170. Epub 2006 May 3.

Reduced bio-basis function neural networks for protease cleavage site prediction.

J Bioinform Comput Biol. 2004 Sep;2(3):511-31. doi: 10.1142/s0219720004000715.

引用本文的文献

Effectively predicting HIV-1 protease cleavage sites by using an ensemble learning approach.

BMC Bioinformatics. 2022 Oct 27;23(1):447. doi: 10.1186/s12859-022-04999-y.

SARS-CoV-2 3CLpro whole human proteome cleavage prediction and enrichment/depletion analysis.

Comput Biol Chem. 2022 Jun;98:107671. doi: 10.1016/j.compbiolchem.2022.107671. Epub 2022 Mar 28.

Predicting HIV-1 Protease Cleavage Sites With Positive-Unlabeled Learning.

Front Genet. 2021 Mar 26;12:658078. doi: 10.3389/fgene.2021.658078. eCollection 2021.

The importance of physicochemical characteristics and nonlinear classifiers in determining HIV-1 protease specificity.

Bioengineered. 2016 Apr 2;7(2):65-78. doi: 10.1080/21655979.2016.1149271.

Feature Selection Combined with Neural Network Structure Optimization for HIV-1 Protease Cleavage Site Prediction.

Biomed Res Int. 2015;2015:263586. doi: 10.1155/2015/263586. Epub 2015 Apr 15.

Biologically inspired intelligent decision making: a commentary on the use of artificial neural networks in bioinformatics.

Bioengineered. 2014 Mar-Apr;5(2):80-95. doi: 10.4161/bioe.26997. Epub 2013 Dec 16.

A consistency-based feature selection method allied with linear SVMs for HIV-1 protease cleavage site prediction.

PLoS One. 2013 Aug 23;8(8):e63145. doi: 10.1371/journal.pone.0063145. eCollection 2013.

SVM-based prediction of propeptide cleavage sites in spider toxins identifies toxin innovation in an Australian tarantula.

PLoS One. 2013 Jul 22;8(7):e66279. doi: 10.1371/journal.pone.0066279. Print 2013.

OETMAP: a new feature encoding scheme for MHC class I binding prediction.

Mol Cell Biochem. 2012 Jan;359(1-2):67-72. doi: 10.1007/s11010-011-1000-5. Epub 2011 Jul 30.

Predicting human immunodeficiency virus protease cleavage sites in nonlinear projection space.

Mol Cell Biochem. 2010 Jun;339(1-2):127-33. doi: 10.1007/s11010-009-0376-y. Epub 2010 Jan 7.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

为何神经网络不应被用于预测HIV-1蛋白酶切割位点。

Why neural networks should not be used for HIV-1 protease cleavage site prediction.

作者信息

机构信息

出版信息

UNLABELLED

MOTIVATION

RESULTS

AVAILABILITY

未标注

动机

结果

可用性

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献