Tran Linh M, Zhang Bin, Zhang Zhan, Zhang Chunsheng, Xie Tao, Lamb John R, Dai Hongyue, Schadt Eric E, Zhu Jun
Sage Bionetworks, Seattle, WA 98109, USA.
BMC Syst Biol. 2011 Aug 1;5:121. doi: 10.1186/1752-0509-5-121.
One of the primary objectives in cancer research is to identify causal genomic alterations, such as somatic copy number variation (CNV) and somatic mutations, during tumor development. Many valuable studies lack genomic data to detect CNV; therefore, methods that are able to infer CNVs from gene expression data would help maximize the value of these studies.
We developed a framework for identifying recurrent regions of CNV and distinguishing the cancer driver genes from the passenger genes in the regions. By inferring CNV regions across many datasets we were able to identify 109 recurrent amplified/deleted CNV regions. Many of these regions are enriched for genes involved in many important processes associated with tumorigenesis and cancer progression. Genes in these recurrent CNV regions were then examined in the context of gene regulatory networks to prioritize putative cancer driver genes. The cancer driver genes uncovered by the framework include not only well-known oncogenes but also a number of novel cancer susceptibility genes validated via siRNA experiments.
To our knowledge, this is the first effort to systematically identify and validate drivers for expression based CNV regions in breast cancer. The framework where the wavelet analysis of copy number alteration based on expression coupled with the gene regulatory network analysis, provides a blueprint for leveraging genomic data to identify key regulatory components and gene targets. This integrative approach can be applied to many other large-scale gene expression studies and other novel types of cancer data such as next-generation sequencing based expression (RNA-Seq) as well as CNV data.
癌症研究的主要目标之一是在肿瘤发展过程中识别因果基因组改变,如体细胞拷贝数变异(CNV)和体细胞突变。许多有价值的研究缺乏用于检测CNV的基因组数据;因此,能够从基因表达数据推断CNV的方法将有助于最大化这些研究的价值。
我们开发了一个框架,用于识别CNV的复发区域,并区分这些区域中的癌症驱动基因和乘客基因。通过推断多个数据集中的CNV区域,我们能够识别出109个复发的扩增/缺失CNV区域。这些区域中的许多都富含参与肿瘤发生和癌症进展的许多重要过程的基因。然后在基因调控网络的背景下检查这些复发CNV区域中的基因,以对假定的癌症驱动基因进行优先级排序。该框架发现的癌症驱动基因不仅包括知名的癌基因,还包括通过siRNA实验验证的一些新型癌症易感基因。
据我们所知,这是首次系统地识别和验证乳腺癌中基于表达的CNV区域的驱动因素。基于表达的拷贝数改变的小波分析与基因调控网络分析相结合的框架,为利用基因组数据识别关键调控成分和基因靶点提供了蓝图。这种综合方法可以应用于许多其他大规模基因表达研究以及其他新型癌症数据,如下一代测序基的表达(RNA-Seq)以及CNV数据。