基于联合三联体特征和层次上下文的支持向量机酶功能预测

Support vector machine prediction of enzyme function with conjoint triad feature and hierarchical context.

作者信息

Wang Yong-Cui, Wang Yong, Yang Zhi-Xia, Deng Nai-Yang

机构信息

College of Mathematics and System Science, Xinjiang University, Urumuchi, China.

出版信息

BMC Syst Biol. 2011 Jun 20;5 Suppl 1(Suppl 1):S6. doi: 10.1186/1752-0509-5-S1-S6.

DOI:10.1186/1752-0509-5-S1-S6

PMID:21689481

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3121122/

Abstract

BACKGROUND

Enzymes are known as the largest class of proteins and their functions are usually annotated by the Enzyme Commission (EC), which uses a hierarchy structure, i.e., four numbers separated by periods, to classify the function of enzymes. Automatically categorizing enzyme into the EC hierarchy is crucial to understand its specific molecular mechanism.

RESULTS

In this paper, we introduce two key improvements in predicting enzyme function within the machine learning framework. One is to introduce the efficient sequence encoding methods for representing given proteins. The second one is to develop a structure-based prediction method with low computational complexity. In particular, we propose to use the conjoint triad feature (CTF) to represent the given protein sequences by considering not only the composition of amino acids but also the neighbor relationships in the sequence. Then we develop a support vector machine (SVM)-based method, named as SVMHL (SVM for hierarchy labels), to output enzyme function by fully considering the hierarchical structure of EC. The experimental results show that our SVMHL with the CTF outperforms SVMHL with the amino acid composition (AAC) feature both in predictive accuracy and Matthew's correlation coefficient (MCC). In addition, SVMHL with the CTF obtains the accuracy and MCC ranging from 81% to 98% and 0.82 to 0.98 when predicting the first three EC digits on a low-homologous enzyme dataset. We further demonstrate that our method outperforms the methods which do not take account of hierarchical relationship among enzyme categories and alternative methods which incorporate prior knowledge about inter-class relationships.

CONCLUSIONS

Our structure-based prediction model, SVMHL with the CTF, reduces the computational complexity and outperforms the alternative approaches in enzyme function prediction. Therefore our new method will be a useful tool for enzyme function prediction community.

摘要

背景

酶是已知最大的蛋白质类别，其功能通常由酶委员会（EC）注释，该委员会使用层次结构，即由句点分隔的四个数字，对酶的功能进行分类。将酶自动分类到EC层次结构中对于理解其特定分子机制至关重要。

结果

在本文中，我们介绍了机器学习框架内预测酶功能的两项关键改进。一是引入用于表示给定蛋白质的高效序列编码方法。二是开发一种计算复杂度低的基于结构的预测方法。特别是，我们提出使用联合三联体特征（CTF）来表示给定的蛋白质序列，不仅考虑氨基酸组成，还考虑序列中的相邻关系。然后我们开发了一种基于支持向量机（SVM）的方法，名为SVMHL（用于层次标签的SVM），通过充分考虑EC的层次结构来输出酶功能。实验结果表明，我们带有CTF的SVMHL在预测准确性和马修斯相关系数（MCC）方面均优于带有氨基酸组成（AAC）特征的SVMHL。此外，在低同源酶数据集上预测前三个EC数字时，带有CTF的SVMHL的准确率和MCC范围分别为81%至98%和0.82至0.98。我们进一步证明，我们的方法优于不考虑酶类别之间层次关系的方法以及纳入类间关系先验知识的替代方法。

结论

我们基于结构的预测模型，即带有CTF的SVMHL，降低了计算复杂度，并且在酶功能预测方面优于替代方法。因此，我们的新方法将成为酶功能预测领域的一个有用工具。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8875/3121122/f8d1a160b205/1752-0509-5-S1-S6-1.jpg

相似文献

Support vector machine prediction of enzyme function with conjoint triad feature and hierarchical context.基于联合三联体特征和层次上下文的支持向量机酶功能预测

BMC Syst Biol. 2011 Jun 20;5 Suppl 1(Suppl 1):S6. doi: 10.1186/1752-0509-5-S1-S6.

Prediction of enzyme subfamily class via pseudo amino acid composition by incorporating the conjoint triad feature.通过整合联合三联体特征，利用伪氨基酸组成预测酶亚家族类别。

Protein Pept Lett. 2010 Nov;17(11):1441-9. doi: 10.2174/0929866511009011441.

Accurate prediction of nuclear receptors with conjoint triad feature.利用联合三联体特征准确预测核受体。

BMC Bioinformatics. 2015 Dec 3;16:402. doi: 10.1186/s12859-015-0828-1.

Prediction of nuclear proteins using nuclear translocation signals proposed by probabilistic latent semantic indexing.基于概率潜在语义索引的核转位信号预测核蛋白。

BMC Bioinformatics. 2012;13 Suppl 17(Suppl 17):S13. doi: 10.1186/1471-2105-13-S17-S13. Epub 2012 Dec 13.

ECPred: a tool for the prediction of the enzymatic functions of protein sequences based on the EC nomenclature.ECPred：一种基于 EC 命名法预测蛋白质序列酶功能的工具。

BMC Bioinformatics. 2018 Sep 21;19(1):334. doi: 10.1186/s12859-018-2368-y.

EC number prediction of protein sequences based on combination of hierarchical and global features.基于层次化和全局特征组合的蛋白质序列 EC 数预测。

Yi Chuan. 2024 Aug;46(8):661-669. doi: 10.16288/j.yczz.24-102.

SVM-Fold: a tool for discriminative multi-class protein fold and superfamily recognition.支持向量机折叠法：一种用于判别式多类别蛋白质折叠和超家族识别的工具。

BMC Bioinformatics. 2007 May 22;8 Suppl 4(Suppl 4):S2. doi: 10.1186/1471-2105-8-S4-S2.

Prediction of the beta-hairpins in proteins using support vector machine.使用支持向量机预测蛋白质中的β-发夹结构。

Protein J. 2008 Feb;27(2):115-22. doi: 10.1007/s10930-007-9114-z.

Relationship between global structural parameters and Enzyme Commission hierarchy: implications for function prediction.全局结构参数与酶委员会层级的关系：对功能预测的启示。

Comput Biol Chem. 2012 Oct;40:15-9. doi: 10.1016/j.compbiolchem.2012.06.003. Epub 2012 Aug 14.

Prediction of enzyme classification from protein sequence without the use of sequence similarity.不使用序列相似性从蛋白质序列预测酶分类。

Proc Int Conf Intell Syst Mol Biol. 1997;5:92-9.

引用本文的文献

Prediction of enzyme function using an interpretable optimized ensemble learning framework.使用可解释的优化集成学习框架预测酶的功能。

Chem Sci. 2025 Sep 1. doi: 10.1039/d5sc04513d.

In silico protein function prediction: the rise of machine learning-based approaches.计算机模拟蛋白质功能预测：基于机器学习方法的兴起

Med Rev (2021). 2023 Nov 29;3(6):487-510. doi: 10.1515/mr-2023-0038. eCollection 2023 Dec.

Identification of Proteins of Tobacco Mosaic Virus by Using a Method of Feature Extraction.利用特征提取方法鉴定烟草花叶病毒的蛋白质

Front Genet. 2020 Oct 9;11:569100. doi: 10.3389/fgene.2020.569100. eCollection 2020.

Identifying Heat Shock Protein Families from Imbalanced Data by Using Combined Features.利用组合特征从不平衡数据中识别热休克蛋白家族。

Comput Math Methods Med. 2020 Sep 23;2020:8894478. doi: 10.1155/2020/8894478. eCollection 2020.

ECPred: a tool for the prediction of the enzymatic functions of protein sequences based on the EC nomenclature.ECPred：一种基于 EC 命名法预测蛋白质序列酶功能的工具。

BMC Bioinformatics. 2018 Sep 21;19(1):334. doi: 10.1186/s12859-018-2368-y.

DEEPre: sequence-based enzyme EC number prediction by deep learning.DEEPre：基于深度学习的酶 EC 号序列预测。

Bioinformatics. 2018 Mar 1;34(5):760-769. doi: 10.1093/bioinformatics/btx680.

Accurate prediction of nuclear receptors with conjoint triad feature.利用联合三联体特征准确预测核受体。

BMC Bioinformatics. 2015 Dec 3;16:402. doi: 10.1186/s12859-015-0828-1.

A Learning Framework of Nonparallel Hyperplanes Classifier.非平行超平面分类器的学习框架

ScientificWorldJournal. 2015;2015:497617. doi: 10.1155/2015/497617. Epub 2015 Jun 16.

Application of a hierarchical enzyme classification method reveals the role of gut microbiome in human metabolism.一种分层酶分类方法的应用揭示了肠道微生物群在人体新陈代谢中的作用。

BMC Genomics. 2015;16 Suppl 7(Suppl 7):S16. doi: 10.1186/1471-2164-16-S7-S16. Epub 2015 Jun 11.

DomSign: a top-down annotation pipeline to enlarge enzyme space in the protein universe.DomSign：一种自上而下的注释流程，用于拓展蛋白质世界中的酶空间。

BMC Bioinformatics. 2015 Mar 21;16:96. doi: 10.1186/s12859-015-0499-y.

本文引用的文献

Protein Pept Lett. 2010 Nov;17(11):1441-9. doi: 10.2174/0929866511009011441.

Prediction of palmitoylation sites using the composition of k-spaced amino acid pairs.使用 k 间隔氨基酸对组成预测棕榈酰化位点。

Protein Eng Des Sel. 2009 Nov;22(11):707-12. doi: 10.1093/protein/gzp055. Epub 2009 Sep 25.

Towards structured output prediction of enzyme function.迈向酶功能的结构化输出预测。

BMC Proc. 2008 Dec 17;2 Suppl 4(Suppl 4):S2. doi: 10.1186/1753-6561-2-s4-s2.

Predicting gene function in a hierarchical context with an ensemble of classifiers.使用分类器集成在分层背景下预测基因功能。

Genome Biol. 2008;9 Suppl 1(Suppl 1):S3. doi: 10.1186/gb-2008-9-s1-s3. Epub 2008 Jun 27.

EzyPred: a top-down approach for predicting enzyme functional classes and subclasses.EzyPred：一种用于预测酶功能类别和亚类的自上而下方法。

Biochem Biophys Res Commun. 2007 Dec 7;364(1):53-9. doi: 10.1016/j.bbrc.2007.09.098. Epub 2007 Oct 2.

Twin Support Vector Machines for pattern classification.用于模式分类的孪生支持向量机。

IEEE Trans Pattern Anal Mach Intell. 2007 May;29(5):905-10. doi: 10.1109/tpami.2007.1068.

Prediction of membrane protein types from sequences and position-specific scoring matrices.

J Theor Biol. 2007 Jul 21;247(2):259-65. doi: 10.1016/j.jtbi.2007.01.016. Epub 2007 Jan 30.

Predicting protein-protein interactions based only on sequences information.仅基于序列信息预测蛋白质-蛋白质相互作用。

Proc Natl Acad Sci U S A. 2007 Mar 13;104(11):4337-41. doi: 10.1073/pnas.0607879104. Epub 2007 Mar 5.

Multisurface proximal support vector machine classification via generalized eigenvalues.基于广义特征值的多表面近端支持向量机分类

IEEE Trans Pattern Anal Mach Intell. 2006 Jan;28(1):69-74. doi: 10.1109/TPAMI.2006.17.

Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classes.利用两亲性伪氨基酸组成预测酶亚家族类别。

Bioinformatics. 2005 Jan 1;21(1):10-9. doi: 10.1093/bioinformatics/bth466. Epub 2004 Aug 12.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

基于联合三联体特征和层次上下文的支持向量机酶功能预测

Support vector machine prediction of enzyme function with conjoint triad feature and hierarchical context.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献