Vandin Fabio, Upfal Eli, Raphael Benjamin J
Department of Computer Science, Brown University, Providence, Rhode Island, USA.
J Comput Biol. 2011 Mar;18(3):507-22. doi: 10.1089/cmb.2010.0265.
Recent genome sequencing studies have shown that the somatic mutations that drive cancer development are distributed across a large number of genes. This mutational heterogeneity complicates efforts to distinguish functional mutations from sporadic, passenger mutations. Since cancer mutations are hypothesized to target a relatively small number of cellular signaling and regulatory pathways, a common practice is to assess whether known pathways are enriched for mutated genes. We introduce an alternative approach that examines mutated genes in the context of a genome-scale gene interaction network. We present a computationally efficient strategy for de novo identification of subnetworks in an interaction network that are mutated in a statistically significant number of patients. This framework includes two major components. First, we use a diffusion process on the interaction network to define a local neighborhood of "influence" for each mutated gene in the network. Second, we derive a two-stage multiple hypothesis test to bound the false discovery rate (FDR) associated with the identified subnetworks. We test these algorithms on a large human protein-protein interaction network using somatic mutation data from glioblastoma and lung adenocarcinoma samples. We successfully recover pathways that are known to be important in these cancers and also identify additional pathways that have been implicated in other cancers but not previously reported as mutated in these samples. We anticipate that our approach will find increasing use as cancer genome studies increase in size and scope.
近期的基因组测序研究表明,驱动癌症发展的体细胞突变分布于大量基因中。这种突变的异质性使得区分功能性突变与散发性、过客性突变的工作变得复杂。由于癌症突变被假定靶向相对少数的细胞信号传导和调节通路,一种常见的做法是评估已知通路中突变基因是否富集。我们引入了一种替代方法,该方法在全基因组规模的基因相互作用网络背景下研究突变基因。我们提出了一种计算效率高的策略,用于在相互作用网络中从头识别在统计学上有显著数量患者发生突变的子网。这个框架包括两个主要部分。首先,我们在相互作用网络上使用扩散过程为网络中的每个突变基因定义一个“影响”的局部邻域。其次,我们推导了一个两阶段多重假设检验来限制与已识别子网相关的错误发现率(FDR)。我们使用来自胶质母细胞瘤和肺腺癌样本的体细胞突变数据在一个大型人类蛋白质 - 蛋白质相互作用网络上测试了这些算法。我们成功地恢复了已知在这些癌症中重要的通路,还识别出了在其他癌症中涉及但此前未报道在这些样本中发生突变的其他通路。我们预计,随着癌症基因组研究规模和范围的扩大,我们的方法将得到越来越广泛的应用。