• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于支持向量机的酶家族分类

Enzyme family classification by support vector machines.

作者信息

Cai C Z, Han L Y, Ji Z L, Chen Y Z

机构信息

Department of Applied Physics, Chongqing University, Chongqing, Peoples Republic of China.

出版信息

Proteins. 2004 Apr 1;55(1):66-76. doi: 10.1002/prot.20045.

DOI:10.1002/prot.20045
PMID:14997540
Abstract

One approach for facilitating protein function prediction is to classify proteins into functional families. Recent studies on the classification of G-protein coupled receptors and other proteins suggest that a statistical learning method, Support vector machines (SVM), may be potentially useful for protein classification into functional families. In this work, SVM is applied and tested on the classification of enzymes into functional families defined by the Enzyme Nomenclature Committee of IUBMB. SVM classification system for each family is trained from representative enzymes of that family and seed proteins of Pfam curated protein families. The classification accuracy for enzymes from 46 families and for non-enzymes is in the range of 50.0% to 95.7% and 79.0% to 100% respectively. The corresponding Matthews correlation coefficient is in the range of 54.1% to 96.1%. Moreover, 80.3% of the 8,291 correctly classified enzymes are uniquely classified into a specific enzyme family by using a scoring function, indicating that SVM may have certain level of unique prediction capability. Testing results also suggest that SVM in some cases is capable of classification of distantly related enzymes and homologous enzymes of different functions. Effort is being made to use a more comprehensive set of enzymes as training sets and to incorporate multi-class SVM classification systems to further enhance the unique prediction accuracy. Our results suggest the potential of SVM for enzyme family classification and for facilitating protein function prediction. Our software is accessible at http://jing.cz3.nus.edu.sg/cgi-bin/svmprot.cgi.

摘要

促进蛋白质功能预测的一种方法是将蛋白质分类到功能家族中。最近关于G蛋白偶联受体和其他蛋白质分类的研究表明,一种统计学习方法——支持向量机(SVM),可能对将蛋白质分类到功能家族中具有潜在的用处。在这项工作中,SVM被应用于将酶分类到由国际生物化学与分子生物学联盟酶学命名委员会定义的功能家族中,并进行了测试。每个家族的SVM分类系统是从该家族的代表性酶和Pfam精选蛋白质家族的种子蛋白中训练出来的。46个家族的酶和非酶的分类准确率分别在50.0%至95.7%和79.0%至1​​00%的范围内。相应的马修斯相关系数在54.1%至96.1%的范围内。此外,通过使用评分函数,8291个正确分类的酶中有80.3%被唯一分类到特定的酶家族中,这表明SVM可能具有一定程度的独特预测能力。测试结果还表明,SVM在某些情况下能够对远缘相关的酶和不同功能的同源酶进行分类。目前正在努力使用更全面的酶集作为训练集,并纳入多类SVM分类系统,以进一步提高独特预测的准确性。我们的结果表明SVM在酶家族分类和促进蛋白质功能预测方面具有潜力。我们的软件可在http://jing.cz3.nus.edu.sg/cgi-bin/svmprot.cgi上获取。

相似文献

1
Enzyme family classification by support vector machines.基于支持向量机的酶家族分类
Proteins. 2004 Apr 1;55(1):66-76. doi: 10.1002/prot.20045.
2
Predicting functional family of novel enzymes irrespective of sequence similarity: a statistical learning approach.预测与序列相似性无关的新型酶的功能家族:一种统计学习方法。
Nucleic Acids Res. 2004 Dec 7;32(21):6437-44. doi: 10.1093/nar/gkh984. Print 2004.
3
Prediction of the functional class of metal-binding proteins from sequence derived physicochemical properties by support vector machine approach.基于支持向量机方法,通过序列衍生的物理化学性质预测金属结合蛋白的功能类别。
BMC Bioinformatics. 2006 Dec 18;7 Suppl 5(Suppl 5):S13. doi: 10.1186/1471-2105-7-S5-S13.
4
SVM-Prot: Web-based support vector machine software for functional classification of a protein from its primary sequence.SVM-Prot:基于网络的支持向量机软件,用于根据蛋白质一级序列进行功能分类。
Nucleic Acids Res. 2003 Jul 1;31(13):3692-7. doi: 10.1093/nar/gkg600.
5
Prediction of transporter family from protein sequence by support vector machine approach.通过支持向量机方法从蛋白质序列预测转运蛋白家族。
Proteins. 2006 Jan 1;62(1):218-31. doi: 10.1002/prot.20605.
6
Computer prediction of allergen proteins from sequence-derived protein structural and physicochemical properties.基于序列推导的蛋白质结构和物理化学性质对变应原蛋白进行计算机预测。
Mol Immunol. 2007 Jan;44(4):514-20. doi: 10.1016/j.molimm.2006.02.010. Epub 2006 Mar 23.
7
Prediction of functional class of novel bacterial proteins without the use of sequence similarity by a statistical learning method.通过统计学习方法在不使用序列相似性的情况下预测新型细菌蛋白质的功能类别。
J Mol Microbiol Biotechnol. 2005;9(2):86-100. doi: 10.1159/000088839.
8
Protein classification based on text document classification techniques.基于文本文档分类技术的蛋白质分类。
Proteins. 2005 Mar 1;58(4):955-70. doi: 10.1002/prot.20373.
9
Prediction of RNA-binding proteins from primary sequence by a support vector machine approach.通过支持向量机方法从一级序列预测RNA结合蛋白。
RNA. 2004 Mar;10(3):355-68. doi: 10.1261/rna.5890304.
10
Using Chou's amphiphilic pseudo-amino acid composition and support vector machine for prediction of enzyme subfamily classes.利用周氏两亲性伪氨基酸组成和支持向量机预测酶亚家族类别。
J Theor Biol. 2007 Oct 7;248(3):546-51. doi: 10.1016/j.jtbi.2007.06.001. Epub 2007 Jun 9.

引用本文的文献

1
iAMP-CRA: Identifying Antimicrobial Peptides Using Convolutional Recurrent Neural Network with Self-Attention.iAMP-CRA:使用带有自注意力机制的卷积循环神经网络识别抗菌肽
Health Inf Sci Syst. 2025 Mar 5;13(1):25. doi: 10.1007/s13755-025-00342-w. eCollection 2025 Dec.
2
ProSol-multi: Protein solubility prediction via amino acids multi-level correlation and discriminative distribution.ProSol-multi:基于氨基酸多级相关性和判别性分布的蛋白质溶解度预测
Heliyon. 2024 Aug 22;10(17):e36041. doi: 10.1016/j.heliyon.2024.e36041. eCollection 2024 Sep 15.
3
PMTPred: machine-learning-based prediction of protein methyltransferases using the composition of k-spaced amino acid pairs.
PMTPred:基于k间隔氨基酸对组成的蛋白质甲基转移酶的机器学习预测
Mol Divers. 2024 Aug;28(4):2301-2315. doi: 10.1007/s11030-024-10937-2. Epub 2024 Jul 21.
4
A Comprehensive Review on Machine Learning Techniques for Protein Family Prediction.蛋白质家族预测的机器学习技术综述
Protein J. 2024 Apr;43(2):171-186. doi: 10.1007/s10930-024-10181-5. Epub 2024 Mar 1.
5
Identification of Family-Specific Features in Cas9 and Cas12 Proteins: A Machine Learning Approach Using Complete Protein Feature Spectrum.鉴定Cas9和Cas12蛋白中家族特异性特征:一种使用完整蛋白质特征谱的机器学习方法。
bioRxiv. 2024 Jan 23:2024.01.22.576286. doi: 10.1101/2024.01.22.576286.
6
PredictEFC: a fast and efficient multi-label classifier for predicting enzyme family classes.PredictEFC:一种用于预测酶家族类别的快速高效的多标签分类器。
BMC Bioinformatics. 2024 Jan 30;25(1):50. doi: 10.1186/s12859-024-05665-1.
7
In silico protein function prediction: the rise of machine learning-based approaches.计算机模拟蛋白质功能预测:基于机器学习方法的兴起
Med Rev (2021). 2023 Nov 29;3(6):487-510. doi: 10.1515/mr-2023-0038. eCollection 2023 Dec.
8
Multi-label classification and features investigation of antimicrobial peptides with various functional classes.具有不同功能类别的抗菌肽的多标签分类与特征研究
iScience. 2023 Oct 18;26(12):108250. doi: 10.1016/j.isci.2023.108250. eCollection 2023 Dec 15.
9
MULGA, a unified multi-view graph autoencoder-based approach for identifying drug-protein interaction and drug repositioning.MULGA,一种基于统一多视图图自动编码器的方法,用于识别药物-蛋白质相互作用和药物重定位。
Bioinformatics. 2023 Sep 2;39(9). doi: 10.1093/bioinformatics/btad524.
10
Improving automatic GO annotation with semantic similarity.利用语义相似度提高 GO 自动注释的效果。
BMC Bioinformatics. 2022 Dec 12;23(Suppl 2):433. doi: 10.1186/s12859-022-04958-7.