• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

迈向酶功能的结构化输出预测。

Towards structured output prediction of enzyme function.

作者信息

Astikainen Katja, Holm Liisa, Pitkänen Esa, Szedmak Sandor, Rousu Juho

机构信息

Department of Computer Science, PO Box 68, FI-00014 University of Helsinki, Finland.

出版信息

BMC Proc. 2008 Dec 17;2 Suppl 4(Suppl 4):S2. doi: 10.1186/1753-6561-2-s4-s2.

DOI:10.1186/1753-6561-2-s4-s2
PMID:19091049
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2654971/
Abstract

BACKGROUND

In this paper we describe work in progress in developing kernel methods for enzyme function prediction. Our focus is in developing so called structured output prediction methods, where the enzymatic reaction is the combinatorial target object for prediction. We compared two structured output prediction methods, the Hierarchical Max-Margin Markov algorithm (HM3) and the Maximum Margin Regression algorithm (MMR) in hierarchical classification of enzyme function. As sequence features we use various string kernels and the GTG feature set derived from the global alignment trace graph of protein sequences.

RESULTS

In our experiments, in predicting enzyme EC classification we obtain over 85% accuracy (predicting the four digit EC code) and over 91% microlabel F1 score (predicting individual EC digits). In predicting the Gold Standard enzyme families, we obtain over 79% accuracy (predicting family correctly) and over 89% microlabel F1 score (predicting superfamilies and families). In the latter case, structured output methods are significantly more accurate than nearest neighbor classifier. A polynomial kernel over the GTG feature set turned out to be a prerequisite for accurate function prediction. Combining GTG with string kernels boosted accuracy slightly in the case of EC class prediction.

CONCLUSION

Structured output prediction with GTG features is shown to be computationally feasible and to have accuracy on par with state-of-the-art approaches in enzyme function prediction.

摘要

背景

在本文中,我们描述了在开发用于酶功能预测的核方法方面正在进行的工作。我们的重点是开发所谓的结构化输出预测方法,其中酶促反应是预测的组合目标对象。我们在酶功能的层次分类中比较了两种结构化输出预测方法,即层次最大边际马尔可夫算法(HM3)和最大边际回归算法(MMR)。作为序列特征,我们使用各种字符串核以及从蛋白质序列的全局比对迹线图派生的GTG特征集。

结果

在我们的实验中,在预测酶的EC分类时,我们获得了超过85%的准确率(预测四位数字的EC代码)和超过91%的微标签F1分数(预测单个EC数字)。在预测金标准酶家族时,我们获得了超过79%的准确率(正确预测家族)和超过89%的微标签F1分数(预测超家族和家族)。在后一种情况下,结构化输出方法比最近邻分类器明显更准确。事实证明,基于GTG特征集的多项式核是准确功能预测的先决条件。在EC类预测的情况下,将GTG与字符串核相结合可略微提高准确率。

结论

使用GTG特征的结构化输出预测在计算上是可行的,并且在酶功能预测方面具有与现有最先进方法相当的准确率。

相似文献

1
Towards structured output prediction of enzyme function.迈向酶功能的结构化输出预测。
BMC Proc. 2008 Dec 17;2 Suppl 4(Suppl 4):S2. doi: 10.1186/1753-6561-2-s4-s2.
2
SVM-Fold: a tool for discriminative multi-class protein fold and superfamily recognition.支持向量机折叠法:一种用于判别式多类别蛋白质折叠和超家族识别的工具。
BMC Bioinformatics. 2007 May 22;8 Suppl 4(Suppl 4):S2. doi: 10.1186/1471-2105-8-S4-S2.
3
EC number prediction of protein sequences based on combination of hierarchical and global features.基于层次化和全局特征组合的蛋白质序列 EC 数预测。
Yi Chuan. 2024 Aug;46(8):661-669. doi: 10.16288/j.yczz.24-102.
4
Support vector machine prediction of enzyme function with conjoint triad feature and hierarchical context.基于联合三联体特征和层次上下文的支持向量机酶功能预测
BMC Syst Biol. 2011 Jun 20;5 Suppl 1(Suppl 1):S6. doi: 10.1186/1752-0509-5-S1-S6.
5
Feature Combination via Clustering.通过聚类进行特征组合。
IEEE Trans Neural Netw Learn Syst. 2018 Apr;29(4):896-907. doi: 10.1109/TNNLS.2016.2645883. Epub 2017 Jan 27.
6
Mismatch string kernels for discriminative protein classification.用于判别式蛋白质分类的错配字符串核
Bioinformatics. 2004 Mar 1;20(4):467-76. doi: 10.1093/bioinformatics/btg431. Epub 2004 Jan 22.
7
Sepsis mortality prediction with the Quotient Basis Kernel.基于商数基核的脓毒症死亡率预测
Artif Intell Med. 2014 May;61(1):45-52. doi: 10.1016/j.artmed.2014.03.004. Epub 2014 Mar 27.
8
Predicting enzymatic function of protein sequences with attention.利用注意力预测蛋白质序列的酶功能。
Bioinformatics. 2023 Oct 3;39(10). doi: 10.1093/bioinformatics/btad620.
9
Structured max-margin learning for inter-related classifier training and multilabel image annotation.面向相关分类器训练和多标签图像标注的结构化最大间隔学习。
IEEE Trans Image Process. 2011 Mar;20(3):837-54. doi: 10.1109/TIP.2010.2073476. Epub 2010 Sep 7.
10
A framework for multiple kernel support vector regression and its applications to siRNA efficacy prediction.一种多内核支持向量回归框架及其在siRNA疗效预测中的应用。
IEEE/ACM Trans Comput Biol Bioinform. 2009 Apr-Jun;6(2):190-9. doi: 10.1109/TCBB.2008.139.

引用本文的文献

1
Prediction of Human Phenotype Ontology terms by means of hierarchical ensemble methods.通过分层集成方法预测人类表型本体术语
BMC Bioinformatics. 2017 Oct 12;18(1):449. doi: 10.1186/s12859-017-1854-y.
2
Hierarchical ensemble methods for protein function prediction.用于蛋白质功能预测的分层集成方法。
ISRN Bioinform. 2014 May 4;2014:901419. doi: 10.1155/2014/901419. eCollection 2014.
3
Combining heterogeneous data sources for accurate functional annotation of proteins.整合异构数据源以实现蛋白质功能注释的准确性。

本文引用的文献

1
The global trace graph, a novel paradigm for searching protein sequence databases.全局追踪图,一种搜索蛋白质序列数据库的新范式。
Bioinformatics. 2007 Sep 15;23(18):2361-7. doi: 10.1093/bioinformatics/btm358. Epub 2007 Sep 6.
2
A gold standard set of mechanistically diverse enzyme superfamilies.一组具有不同作用机制的酶超家族的金标准。
Genome Biol. 2006;7(1):R8. doi: 10.1186/gb-2006-7-1-r8. Epub 2006 Jan 31.
3
Hierarchical multi-label prediction of gene function.基因功能的分层多标签预测
BMC Bioinformatics. 2013;14 Suppl 3(Suppl 3):S10. doi: 10.1186/1471-2105-14-S3-S10. Epub 2013 Feb 28.
4
Enzyme informatics.酶信息学。
Curr Top Med Chem. 2012;12(17):1911-23. doi: 10.2174/156802612804547353.
5
EnzML: multi-label prediction of enzyme classes using InterPro signatures.EnzML:使用 InterPro 特征进行酶类的多标签预测。
BMC Bioinformatics. 2012 Apr 25;13:61. doi: 10.1186/1471-2105-13-61.
6
Is EC class predictable from reaction mechanism?从反应机制上能否预测 EC 类?
BMC Bioinformatics. 2012 Apr 24;13:60. doi: 10.1186/1471-2105-13-60.
7
Computational Approaches for Automated Classification of Enzyme Sequences.酶序列自动分类的计算方法
J Proteomics Bioinform. 2011 Aug 23;4:147-152. doi: 10.4172/jpb.1000183.
8
Support vector machine prediction of enzyme function with conjoint triad feature and hierarchical context.基于联合三联体特征和层次上下文的支持向量机酶功能预测
BMC Syst Biol. 2011 Jun 20;5 Suppl 1(Suppl 1):S6. doi: 10.1186/1752-0509-5-S1-S6.
9
Inferring branching pathways in genome-scale metabolic networks.推断基因组规模代谢网络中的分支途径。
BMC Syst Biol. 2009 Oct 29;3:103. doi: 10.1186/1752-0509-3-103.
10
Machine learning in systems biology.系统生物学中的机器学习
BMC Proc. 2008 Dec 17;2 Suppl 4(Suppl 4):S1. doi: 10.1186/1753-6561-2-s4-s1.
Bioinformatics. 2006 Apr 1;22(7):830-6. doi: 10.1093/bioinformatics/btk048. Epub 2006 Jan 12.
4
Protein function prediction via graph kernels.通过图核进行蛋白质功能预测。
Bioinformatics. 2005 Jun;21 Suppl 1:i47-56. doi: 10.1093/bioinformatics/bti1007.
5
Accurate detection of very sparse sequence motifs.非常稀疏序列基序的精确检测。
J Comput Biol. 2004;11(5):843-57. doi: 10.1089/cmb.2004.11.843.
6
CYGD: the Comprehensive Yeast Genome Database.CYGD:全面酵母基因组数据库。
Nucleic Acids Res. 2005 Jan 1;33(Database issue):D364-8. doi: 10.1093/nar/gki053.
7
Enzyme family classification by support vector machines.基于支持向量机的酶家族分类
Proteins. 2004 Apr 1;55(1):66-76. doi: 10.1002/prot.20045.
8
Kernel-based data fusion and its application to protein function prediction in yeast.基于核的数据融合及其在酵母蛋白质功能预测中的应用。
Pac Symp Biocomput. 2004:300-11. doi: 10.1142/9789812704856_0029.
9
Protein homology detection using string alignment kernels.使用字符串比对核进行蛋白质同源性检测。
Bioinformatics. 2004 Jul 22;20(11):1682-9. doi: 10.1093/bioinformatics/bth141. Epub 2004 Feb 26.
10
Machine learning of functional class from phenotype data.从表型数据进行功能类别的机器学习。
Bioinformatics. 2002 Jan;18(1):160-6. doi: 10.1093/bioinformatics/18.1.160.