Electrical and Mathematical Sciences and Engineering Division (CEMSE), Computational Bioscience Research Center (CBRC), Computer, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Saudi Arabia.
Computer Science Department, Illinois Institute of Technology, Chicago, IL 60616, USA.
Bioinformatics. 2018 Mar 1;34(5):760-769. doi: 10.1093/bioinformatics/btx680.
Annotation of enzyme function has a broad range of applications, such as metagenomics, industrial biotechnology, and diagnosis of enzyme deficiency-caused diseases. However, the time and resource required make it prohibitively expensive to experimentally determine the function of every enzyme. Therefore, computational enzyme function prediction has become increasingly important. In this paper, we develop such an approach, determining the enzyme function by predicting the Enzyme Commission number.
We propose an end-to-end feature selection and classification model training approach, as well as an automatic and robust feature dimensionality uniformization method, DEEPre, in the field of enzyme function prediction. Instead of extracting manually crafted features from enzyme sequences, our model takes the raw sequence encoding as inputs, extracting convolutional and sequential features from the raw encoding based on the classification result to directly improve the prediction performance. The thorough cross-fold validation experiments conducted on two large-scale datasets show that DEEPre improves the prediction performance over the previous state-of-the-art methods. In addition, our server outperforms five other servers in determining the main class of enzymes on a separate low-homology dataset. Two case studies demonstrate DEEPre's ability to capture the functional difference of enzyme isoforms.
The server could be accessed freely at http://www.cbrc.kaust.edu.sa/DEEPre.
Supplementary data are available at Bioinformatics online.
酶功能注释具有广泛的应用,如宏基因组学、工业生物技术和酶缺乏引起的疾病的诊断。然而,实验确定每种酶的功能所需的时间和资源成本过高。因此,计算酶功能预测变得越来越重要。在本文中,我们开发了一种通过预测酶委员会编号来确定酶功能的方法。
我们提出了一种端到端的特征选择和分类模型训练方法,以及一种自动的、稳健的特征维度均匀化方法 DEEPre,用于酶功能预测领域。我们的模型不是从酶序列中提取手工制作的特征,而是采用原始序列编码作为输入,根据分类结果从原始编码中提取卷积和序列特征,直接提高预测性能。在两个大规模数据集上进行的彻底的交叉验证实验表明,DEEPre 提高了预测性能,超过了以前的最先进方法。此外,我们的服务器在确定独立的低同源数据集上的酶主要类别的性能优于其他五个服务器。两个案例研究表明 DEEPre 能够捕捉酶同工型的功能差异。
该服务器可在 http://www.cbrc.kaust.edu.sa/DEEPre 上免费访问。
补充数据可在《生物信息学》在线获取。