Department of Computer Science and Engineering, 230635 National Institute of Technology Patna , Patna, Bihar, India.
Department of Computer Science and Engineering, C.V. Raman Global University, Bhubaneswar, Odisha, India.
Stat Appl Genet Mol Biol. 2024 Jul 1;23(1). doi: 10.1515/sagmb-2024-0004. eCollection 2024 Jan 1.
Understanding a protein's function based solely on its amino acid sequence is a crucial but intricate task in bioinformatics. Traditionally, this challenge has proven difficult. However, recent years have witnessed the rise of deep learning as a powerful tool, achieving significant success in protein function prediction. Their strength lies in their ability to automatically learn informative features from protein sequences, which can then be used to predict the protein's function. This study builds upon these advancements by proposing a novel model: CNN-CBAM+BiGRU. It incorporates a Convolutional Block Attention Module (CBAM) alongside BiGRUs. CBAM acts as a spotlight, guiding the CNN to focus on the most informative parts of the protein data, leading to more accurate feature extraction. BiGRUs, a type of Recurrent Neural Network (RNN), excel at capturing long-range dependencies within the protein sequence, which are essential for accurate function prediction. The proposed model integrates the strengths of both CNN-CBAM and BiGRU. This study's findings, validated through experimentation, showcase the effectiveness of this combined approach. For the human dataset, the suggested method outperforms the CNN-BIGRU+ATT model by +1.0 % for cellular components, +1.1 % for molecular functions, and +0.5 % for biological processes. For the yeast dataset, the suggested method outperforms the CNN-BIGRU+ATT model by +2.4 % for the cellular component, +1.2 % for molecular functions, and +0.6 % for biological processes.
仅基于氨基酸序列理解蛋白质的功能是生物信息学中的一项关键但复杂的任务。传统上,这一挑战一直难以解决。然而,近年来深度学习作为一种强大的工具兴起,在蛋白质功能预测方面取得了显著的成功。它们的优势在于能够自动从蛋白质序列中学习有意义的特征,然后可以利用这些特征来预测蛋白质的功能。本研究在这些进展的基础上提出了一种新的模型:CNN-CBAM+BiGRU。它结合了卷积块注意力模块(CBAM)和 BiGRU。CBAM 作为聚光灯,引导 CNN 关注蛋白质数据中最有信息量的部分,从而实现更准确的特征提取。BiGRU,一种递归神经网络(RNN),擅长捕捉蛋白质序列中的长程依赖关系,这对于准确的功能预测至关重要。所提出的模型集成了 CNN-CBAM 和 BiGRU 的优势。通过实验验证,本研究的结果展示了这种组合方法的有效性。对于人类数据集,所提出的方法在细胞成分方面比 CNN-BIGRU+ATT 模型提高了+1.0%,在分子功能方面提高了+1.1%,在生物过程方面提高了+0.5%。对于酵母数据集,所提出的方法在细胞成分方面比 CNN-BIGRU+ATT 模型提高了+2.4%,在分子功能方面提高了+1.2%,在生物过程方面提高了+0.6%。