Pan Shuyi, Xiang Xiaoyang, Yan Qunfang, Ding Yanrui
School of Science, Jiangnan University, Wuxi, Jiangsu 214122, P. R. China.
Sheng Wu Yi Xue Gong Cheng Xue Za Zhi. 2025 Aug 25;42(4):817-823. doi: 10.7507/1001-5515.202501045.
Protein structure determines function, and structural information is critical for predicting protein thermostability. This study proposes a novel method for protein thermostability prediction by integrating graph embedding features and network topological features. By constructing residue interaction networks (RINs) to characterize protein structures, we calculated network topological features and utilize deep neural networks (DNN) to mine inherent characteristics. Using DeepWalk and Node2vec algorithms, we obtained node embeddings and extracted graph embedding features through a TopN strategy combined with bidirectional long short-term memory (BiLSTM) networks. Additionally, we introduced the Doc2vec algorithm to replace the Word2vec module in graph embedding algorithms, generating graph embedding feature vector encodings. By employing an attention mechanism to fuse graph embedding features with network topological features, we constructed a high-precision prediction model, achieving 87.85% prediction accuracy on a bacterial protein dataset. Furthermore, we analyzed the differences in the contributions of network topological features in the model and the differences among various graph embedding methods, and found that the combination of DeepWalk features with Doc2vec and all topological features was crucial for the identification of thermostable proteins. This study provides a practical and effective new method for protein thermostability prediction, and at the same time offers theoretical guidance for exploring protein diversity, discovering new thermostable proteins, and the intelligent modification of mesophilic proteins.
蛋白质结构决定功能,结构信息对于预测蛋白质热稳定性至关重要。本研究提出了一种通过整合图嵌入特征和网络拓扑特征来预测蛋白质热稳定性的新方法。通过构建残基相互作用网络(RINs)来表征蛋白质结构,我们计算了网络拓扑特征,并利用深度神经网络(DNN)挖掘内在特征。使用DeepWalk和Node2vec算法,我们获得了节点嵌入,并通过结合双向长短期记忆(BiLSTM)网络的TopN策略提取了图嵌入特征。此外,我们引入了Doc2vec算法来取代图嵌入算法中的Word2vec模块,生成图嵌入特征向量编码。通过采用注意力机制将图嵌入特征与网络拓扑特征融合,我们构建了一个高精度预测模型,在细菌蛋白质数据集上实现了87.85%的预测准确率。此外,我们分析了模型中网络拓扑特征贡献的差异以及各种图嵌入方法之间的差异,发现DeepWalk特征与Doc2vec和所有拓扑特征的组合对于热稳定蛋白质的识别至关重要。本研究为蛋白质热稳定性预测提供了一种实用有效的新方法,同时为探索蛋白质多样性、发现新的热稳定蛋白质以及嗜温蛋白质的智能改造提供了理论指导。