Computational Biology Research Lab (CBRL), Department of Computer Science, National University of Computer and Emerging Sciences, Islamabad 44000, Pakistan.
Bioinformatics. 2020 Nov 1;36(17):4583-4589. doi: 10.1093/bioinformatics/btaa536.
Understanding an enzyme's function is one of the most crucial problem domains in computational biology. Enzymes are a key component in all organisms and many industrial processes as they help in fighting diseases and speed up essential chemical reactions. They have wide applications and therefore, the discovery of new enzymatic proteins can accelerate biological research and commercial productivity. Biological experiments, to determine an enzyme's function, are time-consuming and resource expensive.
In this study, we propose a novel computational approach to predict an enzyme's function up to the fourth level of the Enzyme Commission (EC) Number. Many studies have attempted to predict an enzyme's function. Yet, no approach has properly tackled the fourth and final level of the EC number. The fourth level holds great significance as it gives us the most specific information of how an enzyme performs its function. Our method uses innovative deep learning approaches along with an efficient hierarchical classification scheme to predict an enzyme's precise function. On a dataset of 11 353 enzymes and 402 classes, we achieved a hierarchical accuracy and Macro-F1 score of 91.2% and 81.9%, respectively, on the 4th level. Moreover, our method can be used to predict the function of enzyme isoforms with considerable success. This methodology is broadly applicable for genome-wide prediction that can subsequently lead to automated annotation of enzyme databases and the identification of better/cheaper enzymes for commercial activities.
The web-server can be freely accessed at http://hecnet.cbrlab.org/.
Supplementary data are available at Bioinformatics online.
理解酶的功能是计算生物学中最关键的问题领域之一。酶是所有生物体和许多工业过程的关键组成部分,因为它们有助于对抗疾病和加速重要的化学反应。它们有广泛的应用,因此,发现新的酶蛋白可以加速生物研究和商业生产力。确定酶功能的生物学实验既耗时又耗费资源。
在这项研究中,我们提出了一种新的计算方法,可以预测酶的功能,达到酶委员会(EC)编号的第四级。许多研究都试图预测酶的功能。然而,没有一种方法能够正确地解决 EC 编号的第四级和最后一级。第四级具有重要意义,因为它为我们提供了酶如何执行其功能的最具体信息。我们的方法使用创新的深度学习方法和有效的分层分类方案来预测酶的精确功能。在一个包含 11353 个酶和 402 个类别的数据集上,我们在第四级上实现了 91.2%的分层准确率和 81.9%的宏 F1 得分。此外,我们的方法可以用于预测酶同工酶的功能,取得了相当大的成功。这种方法广泛适用于全基因组预测,随后可以自动注释酶数据库,并确定用于商业活动的更好/更便宜的酶。
该网络服务器可在 http://hecnet.cbrlab.org/ 免费访问。
补充数据可在生物信息学在线获得。