Department of Systems Biology - Unit 950, The University of Texas MD Anderson Cancer Center, 7435 Fannin Street, Houston, TX 77054, USA.
Br J Cancer. 2012 Mar 13;106(6):1107-16. doi: 10.1038/bjc.2011.584. Epub 2012 Feb 16.
The rapid collection of diverse genome-scale data raises the urgent need to integrate and utilise these resources for biological discovery or biomedical applications. For example, diverse transcriptomic and gene copy number variation data are currently collected for various cancers, but relatively few current methods are capable to utilise the emerging information.
We developed and tested a data-integration method to identify gene networks that drive the biology of breast cancer clinical subtypes. The method simultaneously overlays gene expression and gene copy number data on protein-protein interaction, transcriptional-regulatory and signalling networks by identifying coincident genomic and transcriptional disturbances in local network neighborhoods.
We identified distinct driver-networks for each of the three common clinical breast cancer subtypes: oestrogen receptor (ER)+, human epidermal growth factor receptor 2 (HER2)+, and triple receptor-negative breast cancers (TNBC) from patient and cell line data sets. Driver-networks inferred from independent datasets were significantly reproducible. We also confirmed the functional relevance of a subset of randomly selected driver-network members for TNBC in gene knockdown experiments in vitro. We found that TNBC driver-network members genes have increased functional specificity to TNBC cell lines and higher functional sensitivity compared with genes selected by differential expression alone.
Clinical subtype-specific driver-networks identified through data integration are reproducible and functionally important.
快速收集各种基因组规模的数据提出了迫切需要整合和利用这些资源进行生物发现或生物医学应用。例如,目前针对各种癌症收集了不同的转录组学和基因拷贝数变异数据,但相对较少的现有方法能够利用这些新出现的信息。
我们开发并测试了一种数据集成方法,以识别驱动乳腺癌临床亚型生物学的基因网络。该方法通过在局部网络邻域中识别一致的基因组和转录干扰,同时将基因表达和基因拷贝数数据叠加到蛋白质-蛋白质相互作用、转录调节和信号网络上。
我们从患者和细胞系数据集为三种常见的乳腺癌临床亚型(雌激素受体 [ER]+、人类表皮生长因子受体 2 [HER2]+和三受体阴性乳腺癌 [TNBC])鉴定了不同的驱动网络。从独立数据集推断的驱动网络具有显著的可重复性。我们还通过体外基因敲低实验证实了随机选择的 TNBC 驱动网络成员的一部分对于 TNBC 的功能相关性。我们发现,与仅通过差异表达选择的基因相比,TNBC 驱动网络成员基因对 TNBC 细胞系具有更高的功能特异性和功能敏感性。
通过数据集成识别的临床亚型特异性驱动网络具有可重复性和功能重要性。