Suppr超能文献

通过机器学习实现自动单标签和多标签酶功能预测

Automatic single- and multi-label enzymatic function prediction by machine learning.

作者信息

Amidi Shervine, Amidi Afshine, Vlachakis Dimitrios, Paragios Nikos, Zacharaki Evangelia I

机构信息

Department of Applied Mathematics, Center for Visual Computing, Ecole Centrale de Paris (CentraleSupélec), Châtenay-Malabry, France.

MDAKM Group, Department of Computer Engineering and Informatics, University of Patras, Patras, Greece.

出版信息

PeerJ. 2017 Mar 29;5:e3095. doi: 10.7717/peerj.3095. eCollection 2017.

Abstract

The number of protein structures in the PDB database has been increasing more than 15-fold since 1999. The creation of computational models predicting enzymatic function is of major importance since such models provide the means to better understand the behavior of newly discovered enzymes when catalyzing chemical reactions. Until now, single-label classification has been widely performed for predicting enzymatic function limiting the application to enzymes performing unique reactions and introducing errors when multi-functional enzymes are examined. Indeed, some enzymes may be performing different reactions and can hence be directly associated with multiple enzymatic functions. In the present work, we propose a multi-label enzymatic function classification scheme that combines structural and amino acid sequence information. We investigate two fusion approaches (in the feature level and decision level) and assess the methodology for general enzymatic function prediction indicated by the first digit of the enzyme commission (EC) code (six main classes) on 40,034 enzymes from the PDB database. The proposed single-label and multi-label models predict correctly the actual functional activities in 97.8% and 95.5% (based on Hamming-loss) of the cases, respectively. Also the multi-label model predicts all possible enzymatic reactions in 85.4% of the multi-labeled enzymes when the number of reactions is unknown. Code and datasets are available at https://figshare.com/s/a63e0bafa9b71fc7cbd7.

摘要

自1999年以来,蛋白质数据银行(PDB)数据库中的蛋白质结构数量增加了15倍多。创建预测酶功能的计算模型至关重要,因为此类模型为更好地理解新发现的酶催化化学反应时的行为提供了手段。到目前为止,单标签分类已被广泛用于预测酶功能,这限制了其应用于执行独特反应的酶,并且在检查多功能酶时会引入错误。事实上,一些酶可能执行不同的反应,因此可以直接与多种酶功能相关联。在本研究中,我们提出了一种结合结构和氨基酸序列信息的多标签酶功能分类方案。我们研究了两种融合方法(特征级和决策级),并在来自PDB数据库的40034种酶上评估了由酶委员会(EC)代码的第一位数字(六个主要类别)表示的一般酶功能预测方法。所提出的单标签和多标签模型分别在97.8%和95.5%(基于汉明损失)的情况下正确预测了实际功能活性。此外,当反应数量未知时,多标签模型在85.4%的多标签酶中预测了所有可能的酶促反应。代码和数据集可在https://figshare.com/s/a63e0bafa9b71fc7cbd7获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e795/5374972/c662fb182b1d/peerj-05-3095-g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验