通过融合一组基于 Chou 的伪氨基酸组成变体和进化信息的分类器来识别细菌毒力蛋白。

Identifying bacterial virulent proteins by fusing a set of classifiers based on variants of Chou's pseudo amino acid composition and on evolutionary information.

机构信息

University of Padua, Padua.

出版信息

IEEE/ACM Trans Comput Biol Bioinform. 2012;9(2):467-75. doi: 10.1109/TCBB.2011.117. Epub 2011 Aug 18.

DOI:10.1109/TCBB.2011.117

Abstract

The availability of a reliable prediction method for prediction of bacterial virulent proteins has several important applications in research efforts targeted aimed at finding novel drug targets, vaccine candidates, and understanding virulence mechanisms in pathogens. In this work, we have studied several feature extraction approaches for representing proteins and propose a novel bacterial virulent protein prediction method, based on an ensemble of classifiers where the features are extracted directly from the amino acid sequence and from the evolutionary information of a given protein. We have evaluated and compared several ensembles obtained by combining six feature extraction methods and several classification approaches based on two general purpose classifiers (i.e., Support Vector Machine and a variant of input decimated ensemble) and their random subspace version. An extensive evaluation was performed according to a blind testing protocol, where the parameters of the system are optimized using the training set and the system is validated in three different independent data sets, allowing selection of the most performing system and demonstrating the validity of the proposed method. Based on the results obtained using the blind test protocol, it is interesting to note that even if in each independent data set the most performing stand-alone method is not always the same, the fusion of different methods enhances prediction efficiency in all the tested independent data sets.

摘要

一种可靠的细菌毒力蛋白预测方法的可用性在研究工作中有几个重要的应用，旨在寻找新的药物靶点、疫苗候选物，并了解病原体的毒力机制。在这项工作中，我们研究了几种用于表示蛋白质的特征提取方法，并提出了一种新的细菌毒力蛋白预测方法，该方法基于分类器的集成，其中特征直接从氨基酸序列和给定蛋白质的进化信息中提取。我们评估和比较了通过结合六种特征提取方法和几种基于两种通用分类器（即支持向量机和输入稀疏集成的变体）及其随机子空间版本的分类方法获得的几种集成。根据盲测试协议进行了广泛的评估，其中使用训练集优化系统的参数，并在三个不同的独立数据集上验证系统，从而选择性能最佳的系统，并证明所提出方法的有效性。根据盲测试协议获得的结果，有趣的是，即使在每个独立数据集中，性能最佳的独立方法并不总是相同，但不同方法的融合提高了所有测试独立数据集中的预测效率。

相似文献

Identifying bacterial virulent proteins by fusing a set of classifiers based on variants of Chou's pseudo amino acid composition and on evolutionary information.通过融合一组基于 Chou 的伪氨基酸组成变体和进化信息的分类器来识别细菌毒力蛋白。

IEEE/ACM Trans Comput Biol Bioinform. 2012;9(2):467-75. doi: 10.1109/TCBB.2011.117. Epub 2011 Aug 18.

Prediction of protein subcellular multi-localization based on the general form of Chou's pseudo amino acid composition.基于周氏伪氨基酸组成通用形式的蛋白质亚细胞多定位预测

Protein Pept Lett. 2012 Apr;19(4):375-87. doi: 10.2174/092986612799789369.

High performance set of PseAAC and sequence based descriptors for protein classification.用于蛋白质分类的高性能 PseAAC 和基于序列的描述符集。

J Theor Biol. 2010 Sep 7;266(1):1-10. doi: 10.1016/j.jtbi.2010.06.006. Epub 2010 Jun 15.

Genetic programming for creating Chou's pseudo amino acid based features for submitochondria localization.用于创建基于周氏伪氨基酸特征以进行亚线粒体定位的遗传编程。

Amino Acids. 2008 May;34(4):653-60. doi: 10.1007/s00726-007-0018-1. Epub 2008 Jan 4.

SecretP: identifying bacterial secreted proteins by fusing new features into Chou's pseudo-amino acid composition.SecretP：通过将新特征融合到 Chou 的伪氨基酸组成中，来鉴定细菌分泌蛋白。

J Theor Biol. 2010 Nov 7;267(1):1-6. doi: 10.1016/j.jtbi.2010.08.001. Epub 2010 Aug 5.

Identifying GPCRs and their types with Chou's pseudo amino acid composition: an approach from multi-scale energy representation and position specific scoring matrix.利用周氏伪氨基酸组成鉴定G蛋白偶联受体及其类型：一种基于多尺度能量表示和位置特异性得分矩阵的方法。

Protein Pept Lett. 2012 Aug;19(8):890-903. doi: 10.2174/092986612801619589.

Prediction of GABAA receptor proteins using the concept of Chou's pseudo-amino acid composition and support vector machine.基于 Chou 的伪氨基酸组成和支持向量机预测 GABAA 受体蛋白。

J Theor Biol. 2011 Jul 21;281(1):18-23. doi: 10.1016/j.jtbi.2011.04.017. Epub 2011 Apr 28.

Supersecondary structure prediction using Chou's pseudo amino acid composition.利用周所建立的伪氨基酸组成预测超二级结构。

J Comput Chem. 2011 Jan 30;32(2):271-8. doi: 10.1002/jcc.21616.

Prediction of Golgi-resident protein types using general form of Chou's pseudo-amino acid compositions: Approaches with minimal redundancy maximal relevance feature selection.基于周氏伪氨基酸组成的一般形式预测高尔基体驻留蛋白类型：采用最小冗余最大相关特征选择的方法

J Theor Biol. 2016 Aug 7;402:38-44. doi: 10.1016/j.jtbi.2016.04.032. Epub 2016 May 4.

Prediction of protein homo-oligomer types by pseudo amino acid composition: Approached with an improved feature extraction and Naive Bayes Feature Fusion.基于伪氨基酸组成预测蛋白质同源寡聚体类型：采用改进的特征提取和朴素贝叶斯特征融合方法

Amino Acids. 2006 Jun;30(4):461-8. doi: 10.1007/s00726-006-0263-8. Epub 2006 May 15.

引用本文的文献

Analysis of protein determinants of host-specific infection properties of polyomaviruses using machine learning.利用机器学习分析多瘤病毒宿主特异性感染特性的蛋白质决定因素。

Genes Genomics. 2021 Apr;43(4):407-420. doi: 10.1007/s13258-021-01059-2. Epub 2021 Mar 1.

Harnessing Machine Learning To Unravel Protein Degradation in Escherichia coli.利用机器学习解析大肠杆菌中的蛋白质降解

mSystems. 2021 Feb 2;6(1):e01296-20. doi: 10.1128/mSystems.01296-20.

DeepVF: a deep learning-based hybrid framework for identifying virulence factors using the stacking strategy.DeepVF：一种基于深度学习的混合框架，使用堆叠策略识别毒力因子。

Brief Bioinform. 2021 May 20;22(3). doi: 10.1093/bib/bbaa125.

Some illuminating remarks on molecular genetics and genomics as well as drug development.关于分子遗传学和基因组学以及药物开发的一些有启发性的观点。

Mol Genet Genomics. 2020 Mar;295(2):261-274. doi: 10.1007/s00438-019-01634-z. Epub 2020 Jan 1.

PTPD: predicting therapeutic peptides by deep learning and word2vec.PTPD：深度学习和词向量预测治疗性肽。

BMC Bioinformatics. 2019 Sep 6;20(1):456. doi: 10.1186/s12859-019-3006-z.

DP-BINDER: machine learning model for prediction of DNA-binding proteins by fusing evolutionary and physicochemical information.DP-BINDER：一种通过融合进化和物理化学信息来预测 DNA 结合蛋白的机器学习模型。

J Comput Aided Mol Des. 2019 Jul;33(7):645-658. doi: 10.1007/s10822-019-00207-x. Epub 2019 May 23.

Encodings and models for antimicrobial peptide classification for multi-resistant pathogens.用于多重耐药病原体抗菌肽分类的编码与模型

BioData Min. 2019 Mar 4;12:7. doi: 10.1186/s13040-019-0196-x. eCollection 2019.

SeqSVM: A Sequence-Based Support Vector Machine Method for Identifying Antioxidant Proteins.SeqSVM：一种基于序列的支持向量机方法，用于识别抗氧化蛋白。

Int J Mol Sci. 2018 Jun 15;19(6):1773. doi: 10.3390/ijms19061773.

iRNA-3typeA: Identifying Three Types of Modification at RNA's Adenosine Sites.iRNA-3型A：鉴定RNA腺苷位点的三种修饰类型。

Mol Ther Nucleic Acids. 2018 Jun 1;11:468-474. doi: 10.1016/j.omtn.2018.03.012. Epub 2018 Mar 30.

A Novel Hybrid Sequence-Based Model for Identifying Anticancer Peptides.一种用于识别抗癌肽的新型基于序列的混合模型。

Genes (Basel). 2018 Mar 13;9(3):158. doi: 10.3390/genes9030158.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

通过融合一组基于 Chou 的伪氨基酸组成变体和进化信息的分类器来识别细菌毒力蛋白。

Identifying bacterial virulent proteins by fusing a set of classifiers based on variants of Chou's pseudo amino acid composition and on evolutionary information.

机构信息

出版信息

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献