Suppr超能文献

利用生成对抗网络机器学习模型中的拓扑信息提高蛋白质 - 蛋白质相互作用网络中的链路预测效率。

Efficient link prediction in the protein-protein interaction network using topological information in a generative adversarial network machine learning model.

机构信息

Cardiometabolic and MTA-SE System Pharmacology Research Group, Department of Pharmacology and Pharmacotherapy, Semmelweis University, Nagyvárad tér 4, Budapest, 1089, Hungary.

Faculty of Information Technology and Bionics, Pázmány Péter Catholic University, Budapest, Hungary.

出版信息

BMC Bioinformatics. 2022 Feb 19;23(1):78. doi: 10.1186/s12859-022-04598-x.

Abstract

BACKGROUND

The investigation of possible interactions between two proteins in intracellular signaling is an expensive and laborious procedure in the wet-lab, therefore, several in silico approaches have been implemented to narrow down the candidates for future experimental validations. Reformulating the problem in the field of network theory, the set of proteins can be represented as the nodes of a network, while the interactions between them as the edges. The resulting protein-protein interaction (PPI) network enables the use of link prediction techniques in order to discover new probable connections. Therefore, here we aimed to offer a novel approach to the link prediction task in PPI networks, utilizing a generative machine learning model.

RESULTS

We created a tool that consists of two modules, the data processing framework and the machine learning model. As data processing, we used a modified breadth-first search algorithm to traverse the network and extract induced subgraphs, which served as image-like input data for our model. As machine learning, an image-to-image translation inspired conditional generative adversarial network (cGAN) model utilizing Wasserstein distance-based loss improved with gradient penalty was used, taking the combined representation from the data processing as input, and training the generator to predict the probable unknown edges in the provided induced subgraphs. Our link prediction tool was evaluated on the protein-protein interaction networks of five different species from the STRING database by calculating the area under the receiver operating characteristic, the precision-recall curves and the normalized discounted cumulative gain (AUROC, AUPRC, NDCG, respectively). Test runs yielded the averaged results of AUROC = 0.915, AUPRC = 0.176 and NDCG = 0.763 on all investigated species.

CONCLUSION

We developed a software for the purpose of link prediction in PPI networks utilizing machine learning. The evaluation of our software serves as the first demonstration that a cGAN model, conditioned on raw topological features of the PPI network, is an applicable solution for the PPI prediction problem without requiring often unavailable molecular node attributes. The corresponding scripts are available at https://github.com/semmelweis-pharmacology/ppi_pred .

摘要

背景

在细胞内信号转导中研究两种蛋白质之间可能的相互作用是一项昂贵且费力的湿实验,因此,已经实施了几种计算方法来缩小未来实验验证的候选者。在网络理论领域重新表述这个问题,可以将蛋白质集表示为网络的节点,而它们之间的相互作用则表示为边。由此产生的蛋白质-蛋白质相互作用(PPI)网络可以使用链路预测技术来发现新的可能连接。因此,在这里,我们旨在为 PPI 网络中的链路预测任务提供一种新的方法,利用生成式机器学习模型。

结果

我们创建了一个工具,它由两个模块组成,数据处理框架和机器学习模型。作为数据处理,我们使用了一种修改后的广度优先搜索算法来遍历网络并提取诱导子图,这些子图作为我们模型的图像状输入数据。作为机器学习,我们使用了一种基于图像到图像转换的条件生成对抗网络(cGAN)模型,该模型利用 Wasserstein 距离的损失函数,并结合梯度惩罚进行改进,将数据处理的综合表示作为输入,训练生成器来预测所提供的诱导子图中可能的未知边。我们的链接预测工具在 STRING 数据库中的五个不同物种的蛋白质-蛋白质相互作用网络上进行了评估,通过计算接收者操作特征曲线下的面积、精度-召回曲线和归一化折扣累积增益(AUROC、AUPRC、NDCG,分别)。测试运行在所有研究的物种上产生了平均 AUROC=0.915、AUPRC=0.176 和 NDCG=0.763 的结果。

结论

我们开发了一种用于 PPI 网络中链接预测的软件,利用机器学习。我们的软件评估首次证明,在不需要经常不可用的分子节点属性的情况下,基于 PPI 网络的原始拓扑特征条件的 cGAN 模型是一种适用于 PPI 预测问题的解决方案。相应的脚本可在 https://github.com/semmelweis-pharmacology/ppi_pred 获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8200/8858570/bb74c5c2b32d/12859_2022_4598_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验