• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于支持向量机,利用氨基酸残基和氨基酸残基对的结构特性对蛋白质折叠进行分类。

Support Vector Machine-based classification of protein folds using the structural properties of amino acid residues and amino acid residue pairs.

作者信息

Shamim Mohammad Tabrez Anwar, Anwaruddin Mohammad, Nagarajaram H A

机构信息

Laboratory of Computational Biology, Centre for DNA Fingerprinting and Diagnostics, Hyderabad 500 076, India.

出版信息

Bioinformatics. 2007 Dec 15;23(24):3320-7. doi: 10.1093/bioinformatics/btm527. Epub 2007 Nov 7.

DOI:10.1093/bioinformatics/btm527
PMID:17989092
Abstract

MOTIVATION

Fold recognition is a key step in the protein structure discovery process, especially when traditional sequence comparison methods fail to yield convincing structural homologies. Although many methods have been developed for protein fold recognition, their accuracies remain low. This can be attributed to insufficient exploitation of fold discriminatory features.

RESULTS

We have developed a new method for protein fold recognition using structural information of amino acid residues and amino acid residue pairs. Since protein fold recognition can be treated as a protein fold classification problem, we have developed a Support Vector Machine (SVM) based classifier approach that uses secondary structural state and solvent accessibility state frequencies of amino acids and amino acid pairs as feature vectors. Among the individual properties examined secondary structural state frequencies of amino acids gave an overall accuracy of 65.2% for fold discrimination, which is better than the accuracy by any method reported so far in the literature. Combination of secondary structural state frequencies with solvent accessibility state frequencies of amino acids and amino acid pairs further improved the fold discrimination accuracy to more than 70%, which is approximately 8% higher than the best available method. In this study we have also tested, for the first time, an all-together multi-class method known as Crammer and Singer method for protein fold classification. Our studies reveal that the three multi-class classification methods, namely one versus all, one versus one and Crammer and Singer method, yield similar predictions.

AVAILABILITY

Dataset and stand-alone program are available upon request.

摘要

动机

折叠识别是蛋白质结构发现过程中的关键步骤,特别是当传统序列比较方法无法得出令人信服的结构同源性时。尽管已经开发了许多用于蛋白质折叠识别的方法,但其准确性仍然较低。这可归因于对折叠鉴别特征的利用不足。

结果

我们开发了一种利用氨基酸残基和氨基酸残基对的结构信息进行蛋白质折叠识别的新方法。由于蛋白质折叠识别可被视为蛋白质折叠分类问题,我们开发了一种基于支持向量机(SVM)的分类器方法,该方法使用氨基酸和氨基酸对的二级结构状态和溶剂可及性状态频率作为特征向量。在所研究的各个属性中,氨基酸的二级结构状态频率在折叠鉴别方面的总体准确率为65.2%,这优于文献中迄今报道的任何方法的准确率。将氨基酸的二级结构状态频率与氨基酸和氨基酸对的溶剂可及性状态频率相结合,进一步将折叠鉴别准确率提高到70%以上,比现有最佳方法高出约8%。在本研究中,我们还首次测试了一种称为Crammer和Singer方法的全多类方法用于蛋白质折叠分类。我们的研究表明,三种多类分类方法,即一对多、一对一和Crammer和Singer方法,产生相似的预测结果。

可用性

可根据要求提供数据集和独立程序。

相似文献

1
Support Vector Machine-based classification of protein folds using the structural properties of amino acid residues and amino acid residue pairs.基于支持向量机,利用氨基酸残基和氨基酸残基对的结构特性对蛋白质折叠进行分类。
Bioinformatics. 2007 Dec 15;23(24):3320-7. doi: 10.1093/bioinformatics/btm527. Epub 2007 Nov 7.
2
Ensemble classifier for protein fold pattern recognition.用于蛋白质折叠模式识别的集成分类器。
Bioinformatics. 2006 Jul 15;22(14):1717-22. doi: 10.1093/bioinformatics/btl170. Epub 2006 May 3.
3
PFRES: protein fold classification by using evolutionary information and predicted secondary structure.PFRES:利用进化信息和预测的二级结构进行蛋白质折叠分类
Bioinformatics. 2007 Nov 1;23(21):2843-50. doi: 10.1093/bioinformatics/btm475. Epub 2007 Oct 17.
4
Support vector machines for prediction of dihedral angle regions.用于预测二面角区域的支持向量机
Bioinformatics. 2006 Dec 15;22(24):3009-15. doi: 10.1093/bioinformatics/btl489. Epub 2006 Sep 27.
5
Prediction of protein folds: extraction of new features, dimensionality reduction, and fusion of heterogeneous classifiers.蛋白质折叠预测:新特征提取、降维及异构分类器融合
IEEE Trans Nanobioscience. 2009 Mar;8(1):100-10. doi: 10.1109/TNB.2009.2016488. Epub 2009 Mar 10.
6
Predicting disulfide connectivity from protein sequence using multiple sequence feature vectors and secondary structure.使用多序列特征向量和二级结构从蛋白质序列预测二硫键连接性。
Bioinformatics. 2007 Dec 1;23(23):3147-54. doi: 10.1093/bioinformatics/btm505. Epub 2007 Oct 17.
7
Prediction of protein solvent accessibility using fuzzy k-nearest neighbor method.使用模糊k近邻法预测蛋白质溶剂可及性。
Bioinformatics. 2005 Jun 15;21(12):2844-9. doi: 10.1093/bioinformatics/bti423. Epub 2005 Apr 6.
8
A machine learning based method for the prediction of secretory proteins using amino acid composition, their order and similarity-search.一种基于机器学习的方法,利用氨基酸组成、顺序和相似性搜索来预测分泌蛋白。
In Silico Biol. 2008;8(2):129-40.
9
Using pseudo amino acid composition and binary-tree support vector machines to predict protein structural classes.利用伪氨基酸组成和二叉树支持向量机预测蛋白质结构类别。
Amino Acids. 2007 Nov;33(4):623-9. doi: 10.1007/s00726-007-0496-1. Epub 2007 Feb 19.
10
A 3D-1D substitution matrix for protein fold recognition that includes predicted secondary structure of the sequence.一种用于蛋白质折叠识别的3D-1D替换矩阵,其包含序列的预测二级结构。
J Mol Biol. 1997 Apr 11;267(4):1026-38. doi: 10.1006/jmbi.1997.0924.

引用本文的文献

1
From PDB files to protein features: a comparative analysis of PDB bind and STCRDAB datasets.从 PDB 文件到蛋白质特征:PDBbind 和 STCRDAB 数据集的比较分析。
Med Biol Eng Comput. 2024 Aug;62(8):2449-2483. doi: 10.1007/s11517-024-03074-3. Epub 2024 Apr 16.
2
When Protein Structure Embedding Meets Large Language Models.当蛋白质结构嵌入与大型语言模型相遇时。
Genes (Basel). 2023 Dec 23;15(1):25. doi: 10.3390/genes15010025.
3
Ion-pumping microbial rhodopsin protein classification by machine learning approach.基于机器学习方法的离子泵微生物视紫红质蛋白分类。
BMC Bioinformatics. 2023 Jan 27;24(1):29. doi: 10.1186/s12859-023-05138-x.
4
Protein Science Meets Artificial Intelligence: A Systematic Review and a Biochemical Meta-Analysis of an Inter-Field.蛋白质科学与人工智能相遇:跨领域的系统评价与生化荟萃分析
Front Bioeng Biotechnol. 2022 Jul 7;10:788300. doi: 10.3389/fbioe.2022.788300. eCollection 2022.
5
PupStruct: Prediction of Pupylated Lysine Residues Using Structural Properties of Amino Acids.PupStruct:基于氨基酸结构特性预测泛素化赖氨酸残基
Genes (Basel). 2020 Nov 28;11(12):1431. doi: 10.3390/genes11121431.
6
AOPs-SVM: A Sequence-Based Classifier of Antioxidant Proteins Using a Support Vector Machine.AOPs-SVM:一种基于序列的使用支持向量机的抗氧化蛋白分类器。
Front Bioeng Biotechnol. 2019 Sep 18;7:224. doi: 10.3389/fbioe.2019.00224. eCollection 2019.
7
Identifying anticancer peptides by using a generalized chaos game representation.利用广义混沌博弈表示法鉴定抗癌肽
J Math Biol. 2019 Jan;78(1-2):441-463. doi: 10.1007/s00285-018-1279-x. Epub 2018 Oct 5.
8
Prediction of endoplasmic reticulum resident proteins using fragmented amino acid composition and support vector machine.利用片段氨基酸组成和支持向量机预测内质网驻留蛋白
PeerJ. 2017 Sep 4;5:e3561. doi: 10.7717/peerj.3561. eCollection 2017.
9
Recent Progress in Machine Learning-Based Methods for Protein Fold Recognition.基于机器学习的蛋白质折叠识别方法的最新进展
Int J Mol Sci. 2016 Dec 16;17(12):2118. doi: 10.3390/ijms17122118.
10
Support Vector Machines Trained with Evolutionary Algorithms Employing Kernel Adatron for Large Scale Classification of Protein Structures.采用内核自适应算法的进化算法训练的支持向量机用于蛋白质结构的大规模分类
Evol Bioinform Online. 2016 Dec 4;12:285-302. doi: 10.4137/EBO.S40912. eCollection 2016.