Rouam Sigrid, Miller Lance D, Karuturi R Krishna Murthy
Procter and Gamble International Operations SA Singapore Branch, Statistics Asia, Singapore.
Department of Cancer Biology, Wake Forest University School of Medicine, Winston-Salem, NC, USA.
Cancer Inform. 2015 Feb 3;13(Suppl 6):35-48. doi: 10.4137/CIN.S18302. eCollection 2014.
Driver genes are directly responsible for oncogenesis and identifying them is essential in order to fully understand the mechanisms of cancer. However, it is difficult to delineate them from the larger pool of genes that are deregulated in cancer (ie, passenger genes). In order to address this problem, we developed an approach called TRIAngulating Gene Expression (TRIAGE through clinico-genomic intersects). Here, we present a refinement of this approach incorporating a new scoring methodology to identify putative driver genes that are deregulated in cancer. TRIAGE triangulates - or integrates - three levels of information: gene expression, gene location, and patient survival. First, TRIAGE identifies regions of deregulated expression (ie, expression footprints) by deriving a newly established measure called the Local Singular Value Decomposition (LSVD) score for each locus. Driver genes are then distinguished from passenger genes using dual survival analyses. Incorporating measurements of gene expression and weighting them according to the LSVD weight of each tumor, these analyses are performed using the genes located in significant expression footprints. Here, we first use simulated data to characterize the newly established LSVD score. We then present the results of our application of this refined version of TRIAGE to gene expression data from five cancer types. This refined version of TRIAGE not only allowed us to identify known prominent driver genes, such as MMP1, IL8, and COL1A2, but it also led us to identify several novel ones. These results illustrate that TRIAGE complements existing tools, allows for the identification of genes that drive cancer and could perhaps elucidate potential future targets of novel anticancer therapeutics.
驱动基因直接导致肿瘤发生,识别它们对于全面理解癌症机制至关重要。然而,很难将它们与癌症中失调的大量基因(即乘客基因)区分开来。为了解决这个问题,我们开发了一种称为“三角测量基因表达(通过临床基因组交叉点进行分类)”的方法。在此,我们展示了这种方法的改进版本,它纳入了一种新的评分方法,以识别在癌症中失调的假定驱动基因。“分类”通过三角测量——或整合——三个层次的信息:基因表达、基因定位和患者生存情况。首先,“分类”通过为每个基因座推导一种新建立的称为局部奇异值分解(LSVD)分数的测量方法来识别失调表达区域(即表达足迹)。然后使用双重生存分析将驱动基因与乘客基因区分开来。结合基因表达测量并根据每个肿瘤的LSVD权重对其进行加权,使用位于显著表达足迹中的基因进行这些分析。在此,我们首先使用模拟数据来表征新建立的LSVD分数。然后我们展示了将这种改进版的“分类”应用于五种癌症类型的基因表达数据的结果。这种改进版的“分类”不仅使我们能够识别已知的重要驱动基因,如基质金属蛋白酶1、白细胞介素8和I型胶原蛋白α2,还使我们识别出了几个新的基因。这些结果表明,“分类”补充了现有工具,能够识别驱动癌症的基因,并可能阐明新型抗癌疗法未来的潜在靶点。