Zhejiang Province Key Laboratory of Smart Management & Application of Modern Agricultural Resources, School of Information Engineering, Huzhou University, Huzhou, Zhejiang 313000, China.
Department of Biochemistry and Biophysics, Science for Life Laboratory, Stockholm University, Stockholm, Solna 17121, Sweden.
Bioinformatics. 2023 Feb 3;39(2). doi: 10.1093/bioinformatics/btad052.
Protein-protein interaction (PPI) networks and transcriptional regulatory networks are critical in regulating cells and their signaling. A thorough understanding of PPIs can provide more insights into cellular physiology at normal and disease states. Although numerous methods have been proposed to predict PPIs, it is still challenging for interaction prediction between unknown proteins. In this study, a novel neural network named AFTGAN was constructed to predict multi-type PPIs. Regarding feature input, ESM-1b embedding containing much biological information for proteins was added as a protein sequence feature besides amino acid co-occurrence similarity and one-hot coding. An ensemble network was also constructed based on a transformer encoder containing an AFT module (performing the weight operation on vital protein sequence feature information) and graph attention network (extracting the relational features of protein pairs) for the part of the network framework.
The experimental results showed that the Micro-F1 of the AFTGAN based on three partitioning schemes (BFS, DFS and the random mode) on the SHS27K and SHS148K datasets was 0.685, 0.711 and 0.867, as well as 0.745, 0.819 and 0.920, respectively, all higher than that of other popular methods. In addition, the experimental comparisons confirmed the performance superiority of the proposed model for predicting PPIs of unknown proteins on the STRING dataset.
The source code is publicly available at https://github.com/1075793472/AFTGAN.
Supplementary data are available at Bioinformatics online.
蛋白质-蛋白质相互作用 (PPI) 网络和转录调控网络对于调节细胞及其信号至关重要。深入了解 PPI 可以为正常和疾病状态下的细胞生理学提供更多的见解。尽管已经提出了许多方法来预测 PPI,但预测未知蛋白质之间的相互作用仍然具有挑战性。在这项研究中,构建了一种名为 AFTGAN 的新型神经网络来预测多种类型的 PPI。在特征输入方面,除了氨基酸共现相似性和独热编码之外,还添加了包含蛋白质丰富生物信息的 ESM-1b 嵌入作为蛋白质序列特征。还基于包含 AFT 模块(对重要蛋白质序列特征信息进行权重操作)和图注意网络(提取蛋白质对的关系特征)的变压器编码器构建了一个集成网络,作为网络框架的一部分。
实验结果表明,基于三种分区方案(BFS、DFS 和随机模式)的 AFTGAN 在 SHS27K 和 SHS148K 数据集上的 Micro-F1 分别为 0.685、0.711 和 0.867,以及 0.745、0.819 和 0.920,均高于其他流行方法。此外,实验比较证实了所提出的模型在预测 STRING 数据集上未知蛋白质 PPI 方面的性能优势。
源代码可在 https://github.com/1075793472/AFTGAN 上公开获取。
补充数据可在生物信息学在线获得。