大型网络中的图动物、子图采样和基序搜索。

Baskerville Kim, Grassberger Peter, Paczuski Maya

Perimeter Institute for Theoretical Physics, Waterloo, Canada N2L 2Y5.

Phys Rev E Stat Nonlin Soft Matter Phys. 2007 Sep;76(3 Pt 2):036107. doi: 10.1103/PhysRevE.76.036107. Epub 2007 Sep 11.

We generalize a sampling algorithm for lattice animals (connected clusters on a regular lattice) to a Monte Carlo algorithm for "graph animals," i.e., connected subgraphs in arbitrary networks. As with the algorithm in [N. Kashtan et al., Bioinformatics 20, 1746 (2004)], it provides a weighted sample, but the computation of the weights is much faster (linear in the size of subgraphs, instead of superexponential). This allows subgraphs with up to ten or more nodes to be sampled with very high statistics, from arbitrarily large networks. Using this together with a heuristic algorithm for rapidly classifying isomorphic graphs, we present results for two protein interaction networks obtained using the tandem affinity purification (TAP) method: one of Escherichia coli with 230 nodes and 695 links, and one for yeast (Saccharomyces cerevisiae) with roughly ten times more nodes and links. We find in both cases that most connected subgraphs are strong motifs (Z scores >10) or antimotifs (Z scores <-10) when the null model is the ensemble of networks with fixed degree sequence. Strong differences appear between the two networks, with dominant motifs in E. coli being (nearly) bipartite graphs and having many pairs of nodes that connect to the same neighbors, while dominant motifs in yeast tend towards completeness or contain large cliques. We also explore a number of methods that do not rely on measurements of Z scores or comparisons with null models. For instance, we discuss the influence of specific complexes like the 26S proteasome in yeast, where a small number of complexes dominate the k cores with large k and have a decisive effect on the strongest motifs with 6-8 nodes. We also present Zipf plots of counts versus rank. They show broad distributions that are not power laws, in contrast to the case when disconnected subgraphs are included.

我们将一种用于晶格动物（规则晶格上的连通簇）的采样算法推广为一种用于“图动物”的蒙特卡罗算法，即任意网络中的连通子图。与文献[N. Kashtan等人，《生物信息学》20, 1746 (2004)]中的算法一样，它提供加权样本，但权重的计算速度要快得多（与子图大小呈线性关系，而非超指数关系）。这使得可以从任意大的网络中以非常高的统计量对节点数多达十个或更多的子图进行采样。将此方法与一种用于快速分类同构图的启发式算法相结合，我们给出了使用串联亲和纯化（TAP）方法获得的两个蛋白质相互作用网络的结果：一个是大肠杆菌的网络，有230个节点和695条边；另一个是酵母（酿酒酵母）的网络，其节点数和边数大约是大肠杆菌网络的十倍。我们发现在这两种情况下，当零模型是具有固定度序列的网络集合时，大多数连通子图是强基序（Z分数>10）或反基序（Z分数<-10）。两个网络之间出现了显著差异，大肠杆菌中的主导基序是（近乎）二分图，并且有许多对节点连接到相同的邻居，而酵母中的主导基序倾向于完备性或包含大的团簇。我们还探索了一些不依赖于Z分数测量或与零模型比较的方法。例如，我们讨论了酵母中特定复合物（如26S蛋白酶体）的影响，其中少数复合物在大k值的k核中占主导地位，并对具有6 - 8个节点的最强基序有决定性作用。我们还给出了计数与秩的齐普夫图。与包含不连通子图的情况相比，它们显示出非幂律的广泛分布。

相似文献

Graph animals, subgraph sampling, and motif search in large networks.

Phys Rev E Stat Nonlin Soft Matter Phys. 2007 Sep;76(3 Pt 2):036107. doi: 10.1103/PhysRevE.76.036107. Epub 2007 Sep 11.

Subgraph ensembles and motif discovery using an alternative heuristic for graph isomorphism.

Phys Rev E Stat Nonlin Soft Matter Phys. 2006 Nov;74(5 Pt 1):051903. doi: 10.1103/PhysRevE.74.051903. Epub 2006 Nov 3.

Network motifs come in sets: correlations in the randomization process.

Phys Rev E Stat Nonlin Soft Matter Phys. 2010 Jul;82(1 Pt 1):011921. doi: 10.1103/PhysRevE.82.011921. Epub 2010 Jul 22.

Subgraphs in random networks.

Phys Rev E Stat Nonlin Soft Matter Phys. 2003 Aug;68(2 Pt 2):026127. doi: 10.1103/PhysRevE.68.026127. Epub 2003 Aug 25.

An novel frequent probability pattern mining algorithm based on circuit simulation method in uncertain biological networks.

BMC Syst Biol. 2014;8 Suppl 3(Suppl 3):S6. doi: 10.1186/1752-0509-8-S3-S6. Epub 2014 Oct 22.

An algorithm for network motif discovery in biological networks.

Int J Data Min Bioinform. 2012;6(1):1-16. doi: 10.1504/ijdmb.2012.045533.

Efficient sampling algorithm for estimating subgraph concentrations and detecting network motifs.

Bioinformatics. 2004 Jul 22;20(11):1746-58. doi: 10.1093/bioinformatics/bth163. Epub 2004 Mar 4.

Detection of Complexes in Biological Networks Through Diversified Dense Subgraph Mining.

J Comput Biol. 2017 Sep;24(9):923-941. doi: 10.1089/cmb.2017.0037. Epub 2017 Jun 1.

Subgraphs and network motifs in geometric networks.

Phys Rev E Stat Nonlin Soft Matter Phys. 2005 Feb;71(2 Pt 2):026117. doi: 10.1103/PhysRevE.71.026117. Epub 2005 Feb 22.

Link and subgraph likelihoods in random undirected networks with fixed and partially fixed degree sequences.

Phys Rev E Stat Nonlin Soft Matter Phys. 2007 Oct;76(4 Pt 2):046112. doi: 10.1103/PhysRevE.76.046112. Epub 2007 Oct 19.

Suppr 超能文献

核心技术专利：CN118964589B侵权必究

相似文献

Graph animals, subgraph sampling, and motif search in large networks.

Phys Rev E Stat Nonlin Soft Matter Phys. 2007 Sep;76(3 Pt 2):036107. doi: 10.1103/PhysRevE.76.036107. Epub 2007 Sep 11.

Subgraph ensembles and motif discovery using an alternative heuristic for graph isomorphism.

Phys Rev E Stat Nonlin Soft Matter Phys. 2006 Nov;74(5 Pt 1):051903. doi: 10.1103/PhysRevE.74.051903. Epub 2006 Nov 3.

Network motifs come in sets: correlations in the randomization process.

Phys Rev E Stat Nonlin Soft Matter Phys. 2010 Jul;82(1 Pt 1):011921. doi: 10.1103/PhysRevE.82.011921. Epub 2010 Jul 22.

Subgraphs in random networks.

Phys Rev E Stat Nonlin Soft Matter Phys. 2003 Aug;68(2 Pt 2):026127. doi: 10.1103/PhysRevE.68.026127. Epub 2003 Aug 25.

An novel frequent probability pattern mining algorithm based on circuit simulation method in uncertain biological networks.

BMC Syst Biol. 2014;8 Suppl 3(Suppl 3):S6. doi: 10.1186/1752-0509-8-S3-S6. Epub 2014 Oct 22.

An algorithm for network motif discovery in biological networks.

Int J Data Min Bioinform. 2012;6(1):1-16. doi: 10.1504/ijdmb.2012.045533.

Efficient sampling algorithm for estimating subgraph concentrations and detecting network motifs.

Bioinformatics. 2004 Jul 22;20(11):1746-58. doi: 10.1093/bioinformatics/bth163. Epub 2004 Mar 4.

Detection of Complexes in Biological Networks Through Diversified Dense Subgraph Mining.

J Comput Biol. 2017 Sep;24(9):923-941. doi: 10.1089/cmb.2017.0037. Epub 2017 Jun 1.

Subgraphs and network motifs in geometric networks.

Phys Rev E Stat Nonlin Soft Matter Phys. 2005 Feb;71(2 Pt 2):026117. doi: 10.1103/PhysRevE.71.026117. Epub 2005 Feb 22.

Link and subgraph likelihoods in random undirected networks with fixed and partially fixed degree sequences.

Phys Rev E Stat Nonlin Soft Matter Phys. 2007 Oct;76(4 Pt 2):046112. doi: 10.1103/PhysRevE.76.046112. Epub 2007 Oct 19.

Graph animals, subgraph sampling, and motif search in large networks.

作者信息

机构信息

出版信息

相似文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献