一种基于分组权重进行蛋白质编码的简化字母表集合，用于预测DNA结合蛋白。

An ensemble of reduced alphabets with protein encoding based on grouped weight for predicting DNA-binding proteins.

作者信息

Nanni Loris, Lumini Alessandra

机构信息

DEIS, IEIIT--CNR, Università di Bologna, Viale Risorgimento 2, 40136 Bologna, Italy.

出版信息

Amino Acids. 2009 Feb;36(2):167-75. doi: 10.1007/s00726-008-0044-7. Epub 2008 Feb 21.

DOI:10.1007/s00726-008-0044-7

PMID:18288459

Abstract

It is well known in the literature that an ensemble of classifiers obtains good performance with respect to that obtained by a stand-alone method. Hence, it is very important to develop ensemble methods well suited for bioinformatics data. In this work, we propose to combine the feature extraction method based on grouped weight with a set of amino-acid alphabets obtained by a Genetic Algorithm. The proposed method is applied for predicting DNA-binding proteins. As classifiers, the linear support vector machine and the radial basis function support vector machine are tested. As performance indicators, the accuracy and Matthews's correlation coefficient are reported. Matthews's correlation coefficient obtained by our ensemble method is approximately 0.97 when the jackknife cross-validation is used. This result outperforms the performance obtained in the literature using the same dataset where the features are extracted directly from the amino-acid sequence.

摘要

文献中众所周知，分类器集成相对于单独方法所获得的性能表现良好。因此，开发非常适合生物信息学数据的集成方法非常重要。在这项工作中，我们建议将基于分组权重的特征提取方法与通过遗传算法获得的一组氨基酸字母表相结合。所提出的方法用于预测DNA结合蛋白。作为分类器，测试了线性支持向量机和径向基函数支持向量机。作为性能指标，报告了准确率和马修斯相关系数。当使用留一法交叉验证时，我们的集成方法获得的马修斯相关系数约为0.97。该结果优于在使用相同数据集的文献中所获得的性能，在该文献中特征是直接从氨基酸序列中提取的。

相似文献

An ensemble of reduced alphabets with protein encoding based on grouped weight for predicting DNA-binding proteins.一种基于分组权重进行蛋白质编码的简化字母表集合，用于预测DNA结合蛋白。

Amino Acids. 2009 Feb;36(2):167-75. doi: 10.1007/s00726-008-0044-7. Epub 2008 Feb 21.

Combing ontologies and dipeptide composition for predicting DNA-binding proteins.结合本体论和二肽组成来预测DNA结合蛋白。

Amino Acids. 2008 May;34(4):635-41. doi: 10.1007/s00726-007-0016-3. Epub 2008 Jan 4.

Genetic programming for creating Chou's pseudo amino acid based features for submitochondria localization.用于创建基于周氏伪氨基酸特征以进行亚线粒体定位的遗传编程。

Amino Acids. 2008 May;34(4):653-60. doi: 10.1007/s00726-007-0018-1. Epub 2008 Jan 4.

Predicting DNA- and RNA-binding proteins from sequences with kernel methods.利用核方法从序列中预测DNA和RNA结合蛋白。

J Theor Biol. 2009 May 21;258(2):289-93. doi: 10.1016/j.jtbi.2009.01.024. Epub 2009 Feb 6.

Using ensemble of classifiers for predicting HIV protease cleavage sites in proteins.使用分类器集成来预测蛋白质中的HIV蛋白酶切割位点。

Amino Acids. 2009 Mar;36(3):409-16. doi: 10.1007/s00726-008-0076-z. Epub 2008 Apr 10.

An ensemble of support vector machines for predicting the membrane protein type directly from the amino acid sequence.用于直接从氨基酸序列预测膜蛋白类型的支持向量机集成。

Amino Acids. 2008 Oct;35(3):573-80. doi: 10.1007/s00726-008-0083-0. Epub 2008 Apr 22.

Prediction of nuclear receptors with optimal pseudo amino acid composition.基于最优伪氨基酸组成的核受体预测。

Anal Biochem. 2009 Apr 1;387(1):54-9. doi: 10.1016/j.ab.2009.01.018. Epub 2009 Jan 19.

Using Chou's pseudo amino acid composition based on approximate entropy and an ensemble of AdaBoost classifiers to predict protein subnuclear location.基于近似熵的周氏伪氨基酸组成和AdaBoost分类器集成来预测蛋白质亚核定位。

Amino Acids. 2008 May;34(4):669-75. doi: 10.1007/s00726-008-0034-9. Epub 2008 Feb 7.

A machine learning based method for the prediction of secretory proteins using amino acid composition, their order and similarity-search.一种基于机器学习的方法，利用氨基酸组成、顺序和相似性搜索来预测分泌蛋白。

In Silico Biol. 2008;8(2):129-40.

Using pseudo amino acid composition and binary-tree support vector machines to predict protein structural classes.利用伪氨基酸组成和二叉树支持向量机预测蛋白质结构类别。

Amino Acids. 2007 Nov;33(4):623-9. doi: 10.1007/s00726-007-0496-1. Epub 2007 Feb 19.

引用本文的文献

Prediction of RNA- and DNA-Binding Proteins Using Various Machine Learning Classifiers.使用各种机器学习分类器预测RNA和DNA结合蛋白

Avicenna J Med Biotechnol. 2019 Jan-Mar;11(1):104-111.

Improved detection of DNA-binding proteins via compression technology on PSSM information.通过基于位置特异性得分矩阵（PSSM）信息的压缩技术改进DNA结合蛋白的检测。

PLoS One. 2017 Sep 29;12(9):e0185587. doi: 10.1371/journal.pone.0185587. eCollection 2017.

nDNA-Prot: identification of DNA-binding proteins based on unbalanced classification.nDNA-Prot：基于不平衡分类的 DNA 结合蛋白识别。

BMC Bioinformatics. 2014 Sep 8;15(1):298. doi: 10.1186/1471-2105-15-298.

iDNA-Prot|dis: identifying DNA-binding proteins by incorporating amino acid distance-pairs and reduced alphabet profile into the general pseudo amino acid composition.iDNA-Prot|dis：通过将氨基酸距离对和简化字母表概况纳入通用伪氨基酸组成来鉴定DNA结合蛋白。

PLoS One. 2014 Sep 3;9(9):e106691. doi: 10.1371/journal.pone.0106691. eCollection 2014.

enDNA-Prot: identification of DNA-binding proteins by applying ensemble learning.enDNA-Prot：通过应用集成学习识别DNA结合蛋白。

Biomed Res Int. 2014;2014:294279. doi: 10.1155/2014/294279. Epub 2014 May 26.

Sequence based prediction of DNA-binding proteins based on hybrid feature selection using random forest and Gaussian naïve Bayes.基于随机森林和高斯朴素贝叶斯混合特征选择的DNA结合蛋白序列预测

PLoS One. 2014 Jan 24;9(1):e86703. doi: 10.1371/journal.pone.0086703. eCollection 2014.

iDNA-Prot: identification of DNA binding proteins using random forest with grey model.iDNA-Prot：基于随机森林和灰色模型识别 DNA 结合蛋白。

PLoS One. 2011;6(9):e24756. doi: 10.1371/journal.pone.0024756. Epub 2011 Sep 15.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

一种基于分组权重进行蛋白质编码的简化字母表集合，用于预测DNA结合蛋白。

An ensemble of reduced alphabets with protein encoding based on grouped weight for predicting DNA-binding proteins.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献