• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于轮廓和多数投票的蛋白质二级结构预测集成方法。

Profiles and majority voting-based ensemble method for protein secondary structure prediction.

机构信息

Department of Computer Science, USTO-MB University, BP 1505 El Mnaouer, Oran, Algeria.

出版信息

Evol Bioinform Online. 2011;7:171-89. doi: 10.4137/EBO.S7931. Epub 2011 Oct 10.

DOI:10.4137/EBO.S7931
PMID:22058650
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3204938/
Abstract

Machine learning techniques have been widely applied to solve the problem of predicting protein secondary structure from the amino acid sequence. They have gained substantial success in this research area. Many methods have been used including k-Nearest Neighbors (k-NNs), Hidden Markov Models (HMMs), Artificial Neural Networks (ANNs) and Support Vector Machines (SVMs), which have attracted attention recently. Today, the main goal remains to improve the prediction quality of the secondary structure elements. The prediction accuracy has been continuously improved over the years, especially by using hybrid or ensemble methods and incorporating evolutionary information in the form of profiles extracted from alignments of multiple homologous sequences. In this paper, we investigate how best to combine k-NNs, ANNs and Multi-class SVMs (M-SVMs) to improve secondary structure prediction of globular proteins. An ensemble method which combines the outputs of two feed-forward ANNs, k-NN and three M-SVM classifiers has been applied. Ensemble members are combined using two variants of majority voting rule. An heuristic based filter has also been applied to refine the prediction. To investigate how much improvement the general ensemble method can give rather than the individual classifiers that make up the ensemble, we have experimented with the proposed system on the two widely used benchmark datasets RS126 and CB513 using cross-validation tests by including PSI-BLAST position-specific scoring matrix (PSSM) profiles as inputs. The experimental results reveal that the proposed system yields significant performance gains when compared with the best individual classifier.

摘要

机器学习技术已广泛应用于解决从氨基酸序列预测蛋白质二级结构的问题。它们在该研究领域取得了很大的成功。许多方法已经被使用,包括 k-最近邻 (k-NN)、隐马尔可夫模型 (HMM)、人工神经网络 (ANN) 和支持向量机 (SVM),最近这些方法引起了关注。如今,主要目标仍然是提高二级结构元素的预测质量。近年来,预测准确性不断提高,特别是通过使用混合或集成方法,并以从多个同源序列比对中提取的轮廓形式纳入进化信息。在本文中,我们研究了如何最好地结合 k-NN、ANN 和多类 SVM (M-SVM) 来提高球状蛋白质的二级结构预测。应用了一种结合两个前馈 ANN、k-NN 和三个 M-SVM 分类器输出的集成方法。通过使用两种多数投票规则的变体对集成成员进行组合。还应用了基于启发式的过滤器来细化预测。为了研究通用集成方法相对于构成集成的各个分类器可以带来多大的改进,我们通过包括 PSI-BLAST 位置特定评分矩阵 (PSSM) 轮廓作为输入,在两个广泛使用的基准数据集 RS126 和 CB513 上进行了交叉验证测试,对所提出的系统进行了实验。实验结果表明,与最佳的单个分类器相比,所提出的系统在性能上有显著的提高。

相似文献

1
Profiles and majority voting-based ensemble method for protein secondary structure prediction.基于轮廓和多数投票的蛋白质二级结构预测集成方法。
Evol Bioinform Online. 2011;7:171-89. doi: 10.4137/EBO.S7931. Epub 2011 Oct 10.
2
Prediction of protein structure classes by incorporating different protein descriptors into general Chou's pseudo amino acid composition.通过将不同的蛋白质描述符纳入通用的周氏伪氨基酸组成来预测蛋白质结构类别。
J Theor Biol. 2014 Nov 7;360:109-116. doi: 10.1016/j.jtbi.2014.07.003. Epub 2014 Jul 12.
3
Prediction of protein secondary structure using probability based features and a hybrid system.基于概率特征和混合系统的蛋白质二级结构预测
J Bioinform Comput Biol. 2013 Oct;11(5):1350012. doi: 10.1142/S0219720013500121. Epub 2013 Sep 26.
4
EL_PSSM-RT: DNA-binding residue prediction by integrating ensemble learning with PSSM Relation Transformation.EL_PSSM-RT:通过整合集成学习与PSSM关系转换进行DNA结合残基预测
BMC Bioinformatics. 2017 Aug 29;18(1):379. doi: 10.1186/s12859-017-1792-8.
5
SVM-Fold: a tool for discriminative multi-class protein fold and superfamily recognition.支持向量机折叠法:一种用于判别式多类别蛋白质折叠和超家族识别的工具。
BMC Bioinformatics. 2007 May 22;8 Suppl 4(Suppl 4):S2. doi: 10.1186/1471-2105-8-S4-S2.
6
Prediction of beta-turns at over 80% accuracy based on an ensemble of predicted secondary structures and multiple alignments.基于预测的二级结构集合和多重比对,以超过80%的准确率预测β转角。
BMC Bioinformatics. 2008 Oct 10;9:430. doi: 10.1186/1471-2105-9-430.
7
Ensemble classifier for protein fold pattern recognition.用于蛋白质折叠模式识别的集成分类器。
Bioinformatics. 2006 Jul 15;22(14):1717-22. doi: 10.1093/bioinformatics/btl170. Epub 2006 May 3.
8
Fast model-based protein homology detection without alignment.基于快速模型的无需比对的蛋白质同源性检测。
Bioinformatics. 2007 Jul 15;23(14):1728-36. doi: 10.1093/bioinformatics/btm247. Epub 2007 May 8.
9
Recognition of medication information from discharge summaries using ensembles of classifiers.使用分类器集成识别出院小结中的药物信息。
BMC Med Inform Decis Mak. 2012 May 7;12:36. doi: 10.1186/1472-6947-12-36.
10
Protein secondary structure prediction based on an improved support vector machines approach.基于改进支持向量机方法的蛋白质二级结构预测
Protein Eng. 2003 Aug;16(8):553-60. doi: 10.1093/protein/gzg072.

引用本文的文献

1
Determining human-coronavirus protein-protein interaction using machine intelligence.利用机器智能确定人类冠状病毒的蛋白质-蛋白质相互作用。
Med Nov Technol Devices. 2023 Jun;18:100228. doi: 10.1016/j.medntd.2023.100228. Epub 2023 Apr 6.
2
A generalised framework for detailed classification of swimming paths inside the Morris Water Maze.用于 Morris 水迷宫内部游泳路径详细分类的通用框架。
Sci Rep. 2018 Oct 10;8(1):15089. doi: 10.1038/s41598-018-33456-1.
3
Protein Secondary Structure Prediction Based on Data Partition and Semi-Random Subspace Method.

本文引用的文献

1
A comparison of methods for multiclass support vector machines.多类支持向量机方法的比较
IEEE Trans Neural Netw. 2002;13(2):415-25. doi: 10.1109/72.991427.
2
MUPRED: a tool for bridging the gap between template based methods and sequence profile based methods for protein secondary structure prediction.MUPRED:一种弥合基于模板的方法与基于序列轮廓的方法在蛋白质二级结构预测方面差距的工具。
Proteins. 2007 Feb 15;66(3):664-70. doi: 10.1002/prot.21177.
3
Learning weighted metrics to minimize nearest-neighbor classification error.学习加权度量以最小化最近邻分类误差。
基于数据分区和半随机子空间方法的蛋白质二级结构预测。
Sci Rep. 2018 Jun 29;8(1):9856. doi: 10.1038/s41598-018-28084-8.
4
Prediction of G Protein-Coupled Receptors with SVM-Prot Features and Random Forest.基于支持向量机-蛋白质特征和随机森林的G蛋白偶联受体预测
Scientifica (Cairo). 2016;2016:8309253. doi: 10.1155/2016/8309253. Epub 2016 Jul 27.
IEEE Trans Pattern Anal Mach Intell. 2006 Jul;28(7):1100-10. doi: 10.1109/TPAMI.2006.145.
4
Protein secondary structure assignment revisited: a detailed analysis of different assignment methods.蛋白质二级结构归属再探讨:不同归属方法的详细分析
BMC Struct Biol. 2005 Sep 15;5:17. doi: 10.1186/1472-6807-5-17.
5
SCRATCH: a protein structure and structural feature prediction server.SCRATCH:一个蛋白质结构和结构特征预测服务器。
Nucleic Acids Res. 2005 Jul 1;33(Web Server issue):W72-6. doi: 10.1093/nar/gki396.
6
Two-stage multi-class support vector machines to protein secondary structure prediction.用于蛋白质二级结构预测的两阶段多类支持向量机
Pac Symp Biocomput. 2005:346-57. doi: 10.1142/9789812702456_0033.
7
Multi-class support vector machines for protein secondary structure prediction.用于蛋白质二级结构预测的多类支持向量机
Genome Inform. 2003;14:218-27.
8
Porter: a new, accurate server for protein secondary structure prediction.波特:一种用于蛋白质二级结构预测的新型精确服务器。
Bioinformatics. 2005 Apr 15;21(8):1719-20. doi: 10.1093/bioinformatics/bti203. Epub 2004 Dec 7.
9
A novel method for protein secondary structure prediction using dual-layer SVM and profiles.一种使用双层支持向量机和轮廓进行蛋白质二级结构预测的新方法。
Proteins. 2004 Mar 1;54(4):738-43. doi: 10.1002/prot.10634.
10
Protein secondary structure prediction based on an improved support vector machines approach.基于改进支持向量机方法的蛋白质二级结构预测
Protein Eng. 2003 Aug;16(8):553-60. doi: 10.1093/protein/gzg072.