利用生成对抗网络机器学习模型中的拓扑信息提高蛋白质 - 蛋白质相互作用网络中的链路预测效率。

Efficient link prediction in the protein-protein interaction network using topological information in a generative adversarial network machine learning model.

机构信息

Cardiometabolic and MTA-SE System Pharmacology Research Group, Department of Pharmacology and Pharmacotherapy, Semmelweis University, Nagyvárad tér 4, Budapest, 1089, Hungary.

Faculty of Information Technology and Bionics, Pázmány Péter Catholic University, Budapest, Hungary.

出版信息

BMC Bioinformatics. 2022 Feb 19;23(1):78. doi: 10.1186/s12859-022-04598-x.

DOI:10.1186/s12859-022-04598-x

PMID:35183129

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8858570/

Abstract

BACKGROUND

The investigation of possible interactions between two proteins in intracellular signaling is an expensive and laborious procedure in the wet-lab, therefore, several in silico approaches have been implemented to narrow down the candidates for future experimental validations. Reformulating the problem in the field of network theory, the set of proteins can be represented as the nodes of a network, while the interactions between them as the edges. The resulting protein-protein interaction (PPI) network enables the use of link prediction techniques in order to discover new probable connections. Therefore, here we aimed to offer a novel approach to the link prediction task in PPI networks, utilizing a generative machine learning model.

RESULTS

We created a tool that consists of two modules, the data processing framework and the machine learning model. As data processing, we used a modified breadth-first search algorithm to traverse the network and extract induced subgraphs, which served as image-like input data for our model. As machine learning, an image-to-image translation inspired conditional generative adversarial network (cGAN) model utilizing Wasserstein distance-based loss improved with gradient penalty was used, taking the combined representation from the data processing as input, and training the generator to predict the probable unknown edges in the provided induced subgraphs. Our link prediction tool was evaluated on the protein-protein interaction networks of five different species from the STRING database by calculating the area under the receiver operating characteristic, the precision-recall curves and the normalized discounted cumulative gain (AUROC, AUPRC, NDCG, respectively). Test runs yielded the averaged results of AUROC = 0.915, AUPRC = 0.176 and NDCG = 0.763 on all investigated species.

CONCLUSION

We developed a software for the purpose of link prediction in PPI networks utilizing machine learning. The evaluation of our software serves as the first demonstration that a cGAN model, conditioned on raw topological features of the PPI network, is an applicable solution for the PPI prediction problem without requiring often unavailable molecular node attributes. The corresponding scripts are available at https://github.com/semmelweis-pharmacology/ppi_pred .

摘要

背景

在细胞内信号转导中研究两种蛋白质之间可能的相互作用是一项昂贵且费力的湿实验，因此，已经实施了几种计算方法来缩小未来实验验证的候选者。在网络理论领域重新表述这个问题，可以将蛋白质集表示为网络的节点，而它们之间的相互作用则表示为边。由此产生的蛋白质-蛋白质相互作用（PPI）网络可以使用链路预测技术来发现新的可能连接。因此，在这里，我们旨在为 PPI 网络中的链路预测任务提供一种新的方法，利用生成式机器学习模型。

结果

我们创建了一个工具，它由两个模块组成，数据处理框架和机器学习模型。作为数据处理，我们使用了一种修改后的广度优先搜索算法来遍历网络并提取诱导子图，这些子图作为我们模型的图像状输入数据。作为机器学习，我们使用了一种基于图像到图像转换的条件生成对抗网络（cGAN）模型，该模型利用 Wasserstein 距离的损失函数，并结合梯度惩罚进行改进，将数据处理的综合表示作为输入，训练生成器来预测所提供的诱导子图中可能的未知边。我们的链接预测工具在 STRING 数据库中的五个不同物种的蛋白质-蛋白质相互作用网络上进行了评估，通过计算接收者操作特征曲线下的面积、精度-召回曲线和归一化折扣累积增益（AUROC、AUPRC、NDCG，分别）。测试运行在所有研究的物种上产生了平均 AUROC=0.915、AUPRC=0.176 和 NDCG=0.763 的结果。

结论

我们开发了一种用于 PPI 网络中链接预测的软件，利用机器学习。我们的软件评估首次证明，在不需要经常不可用的分子节点属性的情况下，基于 PPI 网络的原始拓扑特征条件的 cGAN 模型是一种适用于 PPI 预测问题的解决方案。相应的脚本可在 https://github.com/semmelweis-pharmacology/ppi_pred 获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8200/8858570/bb74c5c2b32d/12859_2022_4598_Fig1_HTML.jpg

相似文献

Efficient link prediction in the protein-protein interaction network using topological information in a generative adversarial network machine learning model.利用生成对抗网络机器学习模型中的拓扑信息提高蛋白质 - 蛋白质相互作用网络中的链路预测效率。

BMC Bioinformatics. 2022 Feb 19;23(1):78. doi: 10.1186/s12859-022-04598-x.

Completing sparse and disconnected protein-protein network by deep learning.通过深度学习填补稀疏且不连续的蛋白质-蛋白质网络。

BMC Bioinformatics. 2018 Mar 22;19(1):103. doi: 10.1186/s12859-018-2112-7.

Prediction and analysis of multiple protein lysine modified sites based on conditional wasserstein generative adversarial networks.基于条件瓦瑟斯坦生成对抗网络的多种蛋白质赖氨酸修饰位点预测与分析

BMC Bioinformatics. 2021 Mar 31;22(1):171. doi: 10.1186/s12859-021-04101-y.

Conditional generative adversarial network driven radiomic prediction of mutation status based on magnetic resonance imaging of breast cancer.基于乳腺癌磁共振成像的条件生成对抗网络驱动的放射组学预测突变状态。

J Transl Med. 2024 Mar 2;22(1):226. doi: 10.1186/s12967-024-05018-9.

Establishment of noninvasive diabetes risk prediction model based on tongue features and machine learning techniques.基于舌象特征和机器学习技术的无创糖尿病风险预测模型的建立。

Int J Med Inform. 2021 May;149:104429. doi: 10.1016/j.ijmedinf.2021.104429. Epub 2021 Feb 22.

Graph-DTI: A New Model for Drug-target Interaction Prediction Based on Heterogenous Network Graph Embedding.图-DTI：一种基于异质网络图嵌入的新药靶相互作用预测新模型。

Curr Comput Aided Drug Des. 2024;20(6):1013-1024. doi: 10.2174/1573409919666230713142255.

PregGAN: A prognosis prediction model for breast cancer based on conditional generative adversarial networks.PregGAN：基于条件生成对抗网络的乳腺癌预后预测模型。

Comput Methods Programs Biomed. 2022 Sep;224:107026. doi: 10.1016/j.cmpb.2022.107026. Epub 2022 Jul 16.

Integration of anatomy ontology data with protein-protein interaction networks improves the candidate gene prediction accuracy for anatomical entities.解剖学本体数据与蛋白质-蛋白质相互作用网络的整合提高了解剖实体候选基因预测的准确性。

BMC Bioinformatics. 2020 Oct 7;21(1):442. doi: 10.1186/s12859-020-03773-2.

GANE: A Generative Adversarial Network Embedding.GANE：一种生成对抗网络嵌入

IEEE Trans Neural Netw Learn Syst. 2020 Jul;31(7):2325-2335. doi: 10.1109/TNNLS.2019.2921841. Epub 2019 Jul 9.

QPoweredCompound2DeNovoDrugPropMax - a novel programmatic tool incorporating deep learning and methods for automated in silico bio-activity discovery for any compound of interest.QPoweredCompound2DeNovoDrugPropMax——一种新颖的编程工具，融合深度学习和方法，可对任何感兴趣的化合物进行自动化的计算机虚拟生物活性发现。

J Biomol Struct Dyn. 2023 Mar;41(5):1790-1797. doi: 10.1080/07391102.2021.2024450. Epub 2022 Jan 10.

引用本文的文献

Progress and trends on machine learning in proteomics during 1997-2024: a bibliometric analysis.1997 - 2024年蛋白质组学中机器学习的进展与趋势：文献计量分析

Front Med (Lausanne). 2025 Aug 15;12:1594442. doi: 10.3389/fmed.2025.1594442. eCollection 2025.

Protein Sequence Analysis landscape: A Systematic Review of Task Types, Databases, Datasets, Word Embeddings Methods, and Language Models.蛋白质序列分析全景：任务类型、数据库、数据集、词嵌入方法和语言模型的系统综述

Database (Oxford). 2025 May 30;2025. doi: 10.1093/database/baaf027.

Heterogeneous network approaches to protein pathway prediction.用于蛋白质通路预测的异构网络方法。

Comput Struct Biotechnol J. 2024 Jun 27;23:2727-2739. doi: 10.1016/j.csbj.2024.06.022. eCollection 2024 Dec.

Link prediction on bipartite networks using matrix factorization with negative sample selection.基于负样本选择的矩阵分解的二部网络链路预测。

PLoS One. 2023 Aug 16;18(8):e0289568. doi: 10.1371/journal.pone.0289568. eCollection 2023.

Machine Learning Methods for Small Data Challenges in Molecular Science.机器学习方法在分子科学中小数据挑战中的应用。

Chem Rev. 2023 Jul 12;123(13):8736-8780. doi: 10.1021/acs.chemrev.3c00189. Epub 2023 Jun 29.

Topological feature generation for link prediction in biological networks.拓扑特征生成在生物网络链路预测中的应用。

PeerJ. 2023 May 9;11:e15313. doi: 10.7717/peerj.15313. eCollection 2023.

Assessment of community efforts to advance network-based prediction of protein-protein interactions.评估社区在推进基于网络的蛋白质-蛋白质相互作用预测方面的努力。

Nat Commun. 2023 Mar 22;14(1):1582. doi: 10.1038/s41467-023-37079-7.

Identification of protein-protein interaction associated functions based on gene ontology and KEGG pathway.基于基因本体论和KEGG通路鉴定蛋白质-蛋白质相互作用相关功能

Front Genet. 2022 Sep 12;13:1011659. doi: 10.3389/fgene.2022.1011659. eCollection 2022.

Overview of methods for characterization and visualization of a protein-protein interaction network in a multi-omics integration context.多组学整合背景下蛋白质-蛋白质相互作用网络的表征与可视化方法概述。

Front Mol Biosci. 2022 Sep 8;9:962799. doi: 10.3389/fmolb.2022.962799. eCollection 2022.

Protein Function Analysis through Machine Learning.基于机器学习的蛋白质功能分析。

Biomolecules. 2022 Sep 6;12(9):1246. doi: 10.3390/biom12091246.

本文引用的文献

Gene Size Matters: An Analysis of Gene Length in the Human Genome.基因大小至关重要：人类基因组中基因长度的分析

Front Genet. 2021 Feb 11;12:559998. doi: 10.3389/fgene.2021.559998. eCollection 2021.

Classification and prediction of protein-protein interaction interface using machine learning algorithm.基于机器学习算法的蛋白质-蛋白质相互作用界面分类与预测。

Sci Rep. 2021 Jan 19;11(1):1761. doi: 10.1038/s41598-020-80900-2.

The Gene Ontology resource: enriching a GOld mine.基因本体论资源：丰富一个 GOld 矿。

Nucleic Acids Res. 2021 Jan 8;49(D1):D325-D334. doi: 10.1093/nar/gkaa1113.

Generative dynamic link prediction.生成动态链接预测。

Chaos. 2019 Dec;29(12):123111. doi: 10.1063/1.5120722.

Graph embedding on biomedical networks: methods, applications and evaluations.生物医学网络上的图嵌入：方法、应用和评估。

Bioinformatics. 2020 Feb 15;36(4):1241-1251. doi: 10.1093/bioinformatics/btz718.

The EntOptLayout Cytoscape plug-in for the efficient visualization of major protein complexes in protein-protein interaction and signalling networks.EntOptLayout Cytoscape 插件，用于高效可视化蛋白质相互作用和信号网络中的主要蛋白质复合物。

Bioinformatics. 2019 Nov 1;35(21):4490-4492. doi: 10.1093/bioinformatics/btz257.

Exploring protein-protein interactions using the site-identification by ligand competitive saturation methodology.利用配体竞争饱和法进行蛋白质-蛋白质相互作用的研究。

Proteins. 2019 Apr;87(4):289-301. doi: 10.1002/prot.25650. Epub 2019 Jan 10.

Network embedding in biomedical data science.生物医学数据科学中的网络嵌入

Brief Bioinform. 2020 Jan 17;21(1):182-197. doi: 10.1093/bib/bby117.

Predicting protein-protein interactions through sequence-based deep learning.基于序列的深度学习预测蛋白质-蛋白质相互作用。

Bioinformatics. 2018 Sep 1;34(17):i802-i810. doi: 10.1093/bioinformatics/bty573.

Deep Neural Network Based Predictions of Protein Interactions Using Primary Sequences.基于深度神经网络的利用原始序列预测蛋白质相互作用。

Molecules. 2018 Aug 1;23(8):1923. doi: 10.3390/molecules23081923.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

利用生成对抗网络机器学习模型中的拓扑信息提高蛋白质 - 蛋白质相互作用网络中的链路预测效率。

Efficient link prediction in the protein-protein interaction network using topological information in a generative adversarial network machine learning model.

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSION

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献