SANA：通过拓扑网络比对进行跨物种基因本体论 GO 注释的预测。

SANA: cross-species prediction of Gene Ontology GO annotations via topological network alignment.

机构信息

Department of Computer Science, University of California, Irvine, CA, 92697-3435, USA.

出版信息

NPJ Syst Biol Appl. 2022 Jul 20;8(1):25. doi: 10.1038/s41540-022-00232-x.

DOI:10.1038/s41540-022-00232-x

PMID:35859153

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9300714/

Abstract

Topological network alignment aims to align two networks node-wise in order to maximize the observed common connection (edge) topology between them. The topological alignment of two protein-protein interaction (PPI) networks should thus expose protein pairs with similar interaction partners allowing, for example, the prediction of common Gene Ontology (GO) terms. Unfortunately, no network alignment algorithm based on topology alone has been able to achieve this aim, though those that include sequence similarity have seen some success. We argue that this failure of topology alone is due to the sparsity and incompleteness of the PPI network data of almost all species, which provides the network topology with a small signal-to-noise ratio that is effectively swamped when sequence information is added to the mix. Here we show that the weak signal can be detected using multiple stochastic samples of "good" topological network alignments, which allows us to observe regions of the two networks that are robustly aligned across multiple samples. The resulting network alignment frequency (NAF) strongly correlates with GO-based Resnik semantic similarity and enables the first successful cross-species predictions of GO terms based on topology-only network alignments. Our best predictions have an AUPR of about 0.4, which is competitive with state-of-the-art algorithms, even when there is no observable sequence similarity and no known homology relationship. While our results provide only a "proof of concept" on existing network data, we hypothesize that predicting GO terms from topology-only network alignments will become increasingly practical as the volume and quality of PPI network data increase.

摘要

拓扑网络比对旨在逐节点地对齐两个网络，以最大化它们之间观察到的共同连接（边）拓扑。因此，两个蛋白质-蛋白质相互作用（PPI）网络的拓扑比对应该揭示具有相似相互作用伙伴的蛋白质对，例如允许预测共同的基因本体（GO）术语。不幸的是，尽管基于序列相似性的网络比对算法已经取得了一些成功，但还没有一种基于拓扑的算法能够实现这一目标。我们认为，这主要是由于几乎所有物种的 PPI 网络数据的稀疏性和不完整性，这些数据为网络拓扑提供了一个小的信噪比，当添加序列信息时，这个信噪比实际上就被淹没了。在这里，我们表明可以使用“良好”拓扑网络比对的多个随机样本来检测到弱信号，这使我们能够观察到两个网络中在多个样本中都稳健对齐的区域。由此产生的网络比对频率（NAF）与基于 Resnik 语义相似性的 GO 高度相关，并允许首次成功地基于仅基于拓扑的网络比对进行跨物种 GO 术语预测。我们的最佳预测的 AUPR 约为 0.4，与最先进的算法相当，即使没有可观察到的序列相似性且没有已知的同源关系也是如此。虽然我们的结果仅提供了现有网络数据的“概念验证”，但我们假设，随着 PPI 网络数据的数量和质量的增加，基于仅拓扑的网络比对预测 GO 术语将变得越来越实用。