Kartha Vinay K, Sebastiani Paola, Kern Joseph G, Zhang Liye, Varelas Xaralabos, Monti Stefano
Bioinformatics Program, Boston University, Boston, MA, United States.
Section of Computational Biomedicine, Boston University School of Medicine, Boston, MA, United States.
Front Genet. 2019 Feb 19;10:121. doi: 10.3389/fgene.2019.00121. eCollection 2019.
The identification of genetic alteration combinations as drivers of a given phenotypic outcome, such as drug sensitivity, gene or protein expression, and pathway activity, is a challenging task that is essential to gaining new biological insights and to discovering therapeutic targets. Existing methods designed to predict complementary drivers of such outcomes lack analytical flexibility, including the support for joint analyses of multiple genomic alteration types, such as somatic mutations and copy number alterations, multiple scoring functions, and rigorous significance and reproducibility testing procedures. To address these limitations, we developed Candidate Driver Analysis or CaDrA, an integrative framework that implements a step-wise heuristic search approach to identify functionally relevant subsets of genomic features that, together, are maximally associated with a specific outcome of interest. We show CaDrA's overall high sensitivity and specificity for typically sized multi-omic datasets using simulated data, and demonstrate CaDrA's ability to identify known mutations linked with sensitivity of cancer cells to drug treatment using data from the Cancer Cell Line Encyclopedia (CCLE). We further apply CaDrA to identify novel regulators of oncogenic activity mediated by Hippo signaling pathway effectors YAP and TAZ in primary breast cancer tumors using data from The Cancer Genome Atlas (TCGA), which we functionally validate . Finally, we use pan-cancer TCGA protein expression data to show the high reproducibility of CaDrA's search procedure. Collectively, this work demonstrates the utility of our framework for supporting the fast querying of large, publicly available multi-omics datasets, including but not limited to TCGA and CCLE, for potential drivers of a given target profile of interest.
将基因改变组合鉴定为特定表型结果(如药物敏感性、基因或蛋白质表达以及信号通路活性)的驱动因素,是一项具有挑战性的任务,对于获得新的生物学见解和发现治疗靶点至关重要。旨在预测此类结果的互补驱动因素的现有方法缺乏分析灵活性,包括对多种基因组改变类型(如体细胞突变和拷贝数改变)的联合分析支持、多种评分函数以及严格的显著性和可重复性测试程序。为了解决这些局限性,我们开发了候选驱动因素分析(CaDrA),这是一个综合框架,它采用逐步启发式搜索方法来识别基因组特征的功能相关子集,这些子集共同与特定的感兴趣结果最大程度相关。我们使用模拟数据展示了CaDrA对于典型规模的多组学数据集总体具有较高的敏感性和特异性,并利用癌症细胞系百科全书(CCLE)的数据证明了CaDrA识别与癌细胞对药物治疗敏感性相关的已知突变的能力。我们进一步应用CaDrA,利用来自癌症基因组图谱(TCGA)的数据,在原发性乳腺癌肿瘤中识别由Hippo信号通路效应器YAP和TAZ介导的致癌活性的新型调节因子,并对其进行功能验证。最后,我们使用泛癌TCGA蛋白质表达数据展示了CaDrA搜索程序的高可重复性。总体而言,这项工作证明了我们的框架在支持快速查询大型公开可用的多组学数据集(包括但不限于TCGA和CCLE)以寻找给定感兴趣目标特征的潜在驱动因素方面的实用性。