Dept. of Biomolecular Engineering and UC Santa Cruz Genomics Institute, University Of California, Santa Cruz, Santa Cruz, CA 95064, USA.
Pac Symp Biocomput. 2020;25:343-354.
Cancer genome projects have produced multidimensional datasets on thousands of samples. Yet, depending on the tumor type, 5-50% of samples have no known driving event. We introduce a semi-supervised method called Learning UnRealized Events (LURE) that uses a progressive label learning framework and minimum spanning analysis to predict cancer drivers based on their altered samples sharing a gene expression signature with the samples of a known event. We demonstrate the utility of the method on the TCGA Pan-Cancer Atlas dataset for which it produced a high-confidence result relating 59 new connections to 18 known mutation events including alterations in the same gene, family, and pathway. We give examples of predicted drivers involved in TP53, telomere maintenance, and MAPK/RTK signaling pathways. LURE identifies connections between genes with no known prior relationship, some of which may offer clues for targeting specific forms of cancer. Code and Supplemental Material are available on the LURE website: https://sysbiowiki.soe.ucsc.edu/lure.
癌症基因组计划已经产生了数千个样本的多维数据集。然而,取决于肿瘤类型,5-50%的样本没有已知的驱动事件。我们引入了一种名为 Learning UnRealized Events (LURE) 的半监督方法,它使用渐进式标签学习框架和最小生成树分析,根据与已知事件样本具有相同基因表达特征的改变样本,预测癌症驱动因子。我们在 TCGA 泛癌症图谱数据集上演示了该方法的实用性,该方法产生了一个高可信度的结果,将 59 个新的连接与 18 个已知的突变事件联系起来,包括同一基因、家族和途径中的改变。我们给出了参与 TP53、端粒维持和 MAPK/RTK 信号通路的预测驱动因子的例子。LURE 识别了没有已知先前关系的基因之间的联系,其中一些可能为针对特定类型的癌症提供线索。代码和补充材料可在 LURE 网站上获得:https://sysbiowiki.soe.ucsc.edu/lure。