School of technology, Beijing Forestry University, Beijing 100083, China.
Key Lab of State Forestry Administration for Forestry Equipment and Automation, Beijing 100083, China.
Yi Chuan. 2024 Aug;46(8):661-669. doi: 10.16288/j.yczz.24-102.
The identification of enzyme functions plays a crucial role in understanding the mechanisms of biological activities and advancing the development of life sciences. However, existing enzyme EC number prediction methods did not fully utilize protein sequence information and still had shortcomings in identification accuracy. To address this issue, we proposed an EC number prediction network using hierarchical features and global features (ECPN-HFGF). This method first utilized residual networks to extract generic features from protein sequences, and then employed hierarchical feature extraction modules and global feature extraction modules to further extract hierarchical and global features of protein sequences. Subsequently, the prediction results of both feature types were combined, and a multitask learning framework was utilized to achieve accurate prediction of enzyme EC numbers. Experimental results indicated that the ECPN-HFGF method performed best in the task of predicting EC numbers for protein sequences, achieving macro F1 and micro F1 scores of 95.5% and 99.0%, respectively. The ECPN-HFGF method effectively combined hierarchical and global features of protein sequences, allowing for rapid and accurate EC number prediction. Compared to current commonly used methods, this method offers significantly higher prediction accuracy, providing an efficient approach for the advancement of enzymology research and enzyme engineering applications.
酶功能的鉴定在理解生物活性的机制和推进生命科学的发展方面起着至关重要的作用。然而,现有的酶 EC 编号预测方法并没有充分利用蛋白质序列信息,在识别准确性方面仍存在不足。为了解决这个问题,我们提出了一种使用层次特征和全局特征的 EC 编号预测网络(ECPN-HFGF)。该方法首先利用残差网络从蛋白质序列中提取通用特征,然后利用层次特征提取模块和全局特征提取模块进一步提取蛋白质序列的层次和全局特征。随后,将两种特征类型的预测结果进行组合,并利用多任务学习框架实现酶 EC 编号的准确预测。实验结果表明,ECPN-HFGF 方法在蛋白质序列 EC 编号预测任务中表现最佳,宏 F1 和微 F1 得分分别达到 95.5%和 99.0%。ECPN-HFGF 方法有效地结合了蛋白质序列的层次和全局特征,能够实现快速准确的 EC 编号预测。与当前常用的方法相比,该方法具有更高的预测准确性,为酶学研究和酶工程应用的发展提供了一种高效的方法。