Lan Qiuhong, Zheng Zhongtuan, Tang Zhen, Qiu Xuehua, Yin Zhixiang
School of Mathematics, Physics and Statistics, Shanghai University of Engineering Science, Shanghai, China.
Institute for Frontier Medical Technology, Shanghai Frontiers Science Research Center for Druggability of Cardiovascular Noncoding RNA, Center of Intelligent Computing and Applied Statistics, Shanghai University of Engineering Science, Shanghai, China.
PLoS One. 2025 Jul 2;20(7):e0326960. doi: 10.1371/journal.pone.0326960. eCollection 2025.
Protein-protein interactions is essential for cellular processes in all organisms. The accurate in-silico identification of these interactions is a significant area of research in biology-related fields, which is crucial for protein function prediction and drug design. Protein sequence data serves as the primary source for computational protein prediction. However, existing models for predicting protein-protein interactions based on sequence information typically consider only a limited set of physicochemical properties of amino acids. Consequently, they fail to comprehensively characterize protein sequence information, resulting in models that perform well within the species for which they were trained but poorly in cross-species environments. Unlike previous models, this paper combines the SVHEHS descriptor with various feature coding techniques to characterize protein sequences more comprehensively. The model employs explicit integration of bidirectional gated recurrent units to fuse multi-information. The final model achieves prediction accuracies of 96.47% and 97.79% on the H. pylori and S. cerevisiae datasets, respectively, outperforming most current models reported in the literature. In particular, the experimental results indicate that the model exhibits strong generalizability across various species datasets, suggesting it can serve as a valuable reference for investigating protein interaction networks in different species.
蛋白质-蛋白质相互作用对于所有生物体的细胞过程至关重要。对这些相互作用进行准确的计算机模拟识别是生物学相关领域的一个重要研究方向,对蛋白质功能预测和药物设计至关重要。蛋白质序列数据是蛋白质计算预测的主要来源。然而,现有的基于序列信息预测蛋白质-蛋白质相互作用的模型通常只考虑了有限的一组氨基酸物理化学性质。因此,它们无法全面表征蛋白质序列信息,导致模型在其训练的物种内表现良好,但在跨物种环境中表现不佳。与先前的模型不同,本文将SVHEHS描述符与各种特征编码技术相结合,以更全面地表征蛋白质序列。该模型采用双向门控循环单元的显式集成来融合多信息。最终模型在幽门螺杆菌和酿酒酵母数据集上的预测准确率分别达到96.47%和97.79%,优于文献中报道的大多数当前模型。特别是,实验结果表明该模型在各种物种数据集上具有很强的通用性,表明它可以作为研究不同物种蛋白质相互作用网络的有价值参考。