Suppr超能文献

大型网络中的图动物、子图采样和基序搜索。

Graph animals, subgraph sampling, and motif search in large networks.

作者信息

Baskerville Kim, Grassberger Peter, Paczuski Maya

机构信息

Perimeter Institute for Theoretical Physics, Waterloo, Canada N2L 2Y5.

出版信息

Phys Rev E Stat Nonlin Soft Matter Phys. 2007 Sep;76(3 Pt 2):036107. doi: 10.1103/PhysRevE.76.036107. Epub 2007 Sep 11.

Abstract

We generalize a sampling algorithm for lattice animals (connected clusters on a regular lattice) to a Monte Carlo algorithm for "graph animals," i.e., connected subgraphs in arbitrary networks. As with the algorithm in [N. Kashtan et al., Bioinformatics 20, 1746 (2004)], it provides a weighted sample, but the computation of the weights is much faster (linear in the size of subgraphs, instead of superexponential). This allows subgraphs with up to ten or more nodes to be sampled with very high statistics, from arbitrarily large networks. Using this together with a heuristic algorithm for rapidly classifying isomorphic graphs, we present results for two protein interaction networks obtained using the tandem affinity purification (TAP) method: one of Escherichia coli with 230 nodes and 695 links, and one for yeast (Saccharomyces cerevisiae) with roughly ten times more nodes and links. We find in both cases that most connected subgraphs are strong motifs (Z scores >10) or antimotifs (Z scores <-10) when the null model is the ensemble of networks with fixed degree sequence. Strong differences appear between the two networks, with dominant motifs in E. coli being (nearly) bipartite graphs and having many pairs of nodes that connect to the same neighbors, while dominant motifs in yeast tend towards completeness or contain large cliques. We also explore a number of methods that do not rely on measurements of Z scores or comparisons with null models. For instance, we discuss the influence of specific complexes like the 26S proteasome in yeast, where a small number of complexes dominate the k cores with large k and have a decisive effect on the strongest motifs with 6-8 nodes. We also present Zipf plots of counts versus rank. They show broad distributions that are not power laws, in contrast to the case when disconnected subgraphs are included.

摘要

我们将一种用于晶格动物(规则晶格上的连通簇)的采样算法推广为一种用于“图动物”的蒙特卡罗算法,即任意网络中的连通子图。与文献[N. Kashtan等人,《生物信息学》20, 1746 (2004)]中的算法一样,它提供加权样本,但权重的计算速度要快得多(与子图大小呈线性关系,而非超指数关系)。这使得可以从任意大的网络中以非常高的统计量对节点数多达十个或更多的子图进行采样。将此方法与一种用于快速分类同构图的启发式算法相结合,我们给出了使用串联亲和纯化(TAP)方法获得的两个蛋白质相互作用网络的结果:一个是大肠杆菌的网络,有230个节点和695条边;另一个是酵母(酿酒酵母)的网络,其节点数和边数大约是大肠杆菌网络的十倍。我们发现在这两种情况下,当零模型是具有固定度序列的网络集合时,大多数连通子图是强基序(Z分数>10)或反基序(Z分数<-10)。两个网络之间出现了显著差异,大肠杆菌中的主导基序是(近乎)二分图,并且有许多对节点连接到相同的邻居,而酵母中的主导基序倾向于完备性或包含大的团簇。我们还探索了一些不依赖于Z分数测量或与零模型比较的方法。例如,我们讨论了酵母中特定复合物(如26S蛋白酶体)的影响,其中少数复合物在大k值的k核中占主导地位,并对具有6 - 8个节点的最强基序有决定性作用。我们还给出了计数与秩的齐普夫图。与包含不连通子图的情况相比,它们显示出非幂律的广泛分布。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验