Department of Computer Science, University of New Mexico, NM, USA.
IET Syst Biol. 2009 Sep;3(5):404-13. doi: 10.1049/iet-syb.2008.0161.
Cellular networks inferred from condition-specific microarray data can capture the functional rewiring of cells in response to different environmental conditions. Unfortunately, many algorithms for inferring cellular networks do not scale to whole-genome data with thousands of variables. We propose a novel approach for scalable learning of large networks: cluster and infer networks (CIN). CIN learns network structures in two steps: (a) partition variables into smaller clusters, and (b) learn networks per cluster. We optionally revisit the cluster assignment of variables with poor neighbourhoods. Results on networks with known topologies suggest that CIN has substantial speed benefits, without substantial performance loss. We applied our approach to microarray compendia of glucose-starved yeast cells. The inferred networks had significantly higher number of subgraphs representing meaningful biological dependencies than random graphs. Analysis of subgraphs identified biological processes that agreed well with existing information about yeast populations under glucose starvation, and also implicated novel pathways that were previously not known to be associated with these populations. [Includes supplementary material].
从特定条件下的微阵列数据推断出的细胞网络可以捕获细胞对不同环境条件的功能重排。不幸的是,许多用于推断细胞网络的算法无法扩展到具有数千个变量的全基因组数据。我们提出了一种用于大规模网络可扩展学习的新方法:聚类和推断网络(CIN)。CIN 通过两步学习网络结构:(a)将变量划分为更小的簇,以及(b)学习每个簇的网络。我们可以选择重新访问具有不良邻域的变量的聚类分配。在具有已知拓扑结构的网络上的结果表明,CIN 具有实质性的速度优势,而没有实质性的性能损失。我们将我们的方法应用于葡萄糖饥饿酵母细胞的微阵列汇编。推断出的网络具有明显更高数量的子图,表示有意义的生物学依赖性,而不是随机图。对子图的分析确定了与葡萄糖饥饿下酵母种群相关的生物学过程,并且还涉及到以前不知道与这些种群相关的新途径。[包括补充材料]。