State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Hohhot, 010070, China.
Amino Acids. 2021 Feb;53(2):239-251. doi: 10.1007/s00726-021-02941-9. Epub 2021 Jan 23.
Enzymes have been proven to play considerable roles in disease diagnosis and biological functions. The feature extraction that truly reflects the intrinsic properties of protein is the most critical step for the automatic identification of enzymes. Although lots of feature extraction methods have been proposed, some challenges remain. In this study, we developed a predictor called IHEC_RAAC, which has the capability to identify whether a protein is a human enzyme and distinguish the function of the human enzyme. To improve the feature representation ability, protein sequences were encoded by a new feature-vector called 'reduced amino acid cluster'. We calculated 673 amino acid reduction alphabets to determine the optimal feature representative scheme. The tenfold cross-validation test showed that the accuracy of IHEC_RAAC to identify human enzymes was 74.66% and further discriminate the human enzyme classes with an accuracy of 54.78%, which was 2.06% and 8.68% higher than the state-of-the-art predictors, respectively. Additionally, the results from the independent dataset indicated that IHEC_RAAC can effectively predict human enzymes and human enzyme classes to further provide guidance for protein research. A user-friendly web server, IHEC_RAAC, is freely accessible at http://bioinfor.imu.edu.cn/ihecraac .
酶在疾病诊断和生物功能中发挥着重要作用。真正反映蛋白质内在特性的特征提取是自动识别酶的最关键步骤。尽管已经提出了许多特征提取方法,但仍存在一些挑战。在这项研究中,我们开发了一个名为 IHEC_RAAC 的预测器,它具有识别蛋白质是否为人酶以及区分人酶功能的能力。为了提高特征表示能力,我们使用了一种新的特征向量“简化氨基酸簇”对蛋白质序列进行编码。我们计算了 673 个氨基酸简化字母来确定最佳特征表示方案。十折交叉验证测试表明,IHEC_RAAC 识别人类酶的准确率为 74.66%,进一步区分人类酶类的准确率为 54.78%,分别比最先进的预测器高 2.06%和 8.68%。此外,来自独立数据集的结果表明,IHEC_RAAC 可以有效地预测人类酶和人类酶类,从而为蛋白质研究提供指导。一个用户友好的网络服务器,IHEC_RAAC,可以在 http://bioinfor.imu.edu.cn/ihecraac 免费访问。