School of Computer Science and Technology, Soochow University, Suzhou 215006, China.
Collaborative Innovation Center of Novel Software Technology and Industrialization, Nanjing 210000, China.
Int J Mol Sci. 2023 Jan 22;24(3):2217. doi: 10.3390/ijms24032217.
Thermophilic proteins have important value in the fields of biopharmaceuticals and enzyme engineering. Most existing thermophilic protein prediction models are based on traditional machine learning algorithms and do not fully utilize protein sequence information. To solve this problem, a deep learning model based on self-attention and multiple-channel feature fusion was proposed to predict thermophilic proteins, called DeepTP. First, a large new dataset consisting of 20,842 proteins was constructed. Second, a convolutional neural network and bidirectional long short-term memory network were used to extract the hidden features in protein sequences. Different weights were then assigned to features through self-attention, and finally, biological features were integrated to build a prediction model. In a performance comparison with existing methods, DeepTP had better performance and scalability in an independent balanced test set and validation set, with AUC values of 0.944 and 0.801, respectively. In the unbalanced test set, DeepTP had an average precision (AP) of 0.536. The tool is freely available.
嗜热蛋白在生物制药和酶工程领域具有重要价值。大多数现有的嗜热蛋白预测模型都是基于传统的机器学习算法,不能充分利用蛋白质序列信息。为了解决这个问题,提出了一种基于自注意力和多通道特征融合的深度学习模型来预测嗜热蛋白,称为 DeepTP。首先,构建了一个由 20842 个蛋白质组成的大型新数据集。其次,使用卷积神经网络和双向长短期记忆网络提取蛋白质序列中的隐藏特征。然后通过自注意力为特征分配不同的权重,最后整合生物学特征构建预测模型。在与现有方法的性能比较中,DeepTP 在独立的平衡测试集和验证集上具有更好的性能和可扩展性,AUC 值分别为 0.944 和 0.801。在不平衡测试集中,DeepTP 的平均精度(AP)为 0.536。该工具是免费提供的。