Department of Biochemistry, Stanford University, Stanford, CA 94305.
Department of Biomedical Data Science, Stanford University, Stanford, CA 94305.
Proc Natl Acad Sci U S A. 2019 Jul 30;116(31):15524-15533. doi: 10.1073/pnas.1900391116. Epub 2019 Jul 15.
The extent to which gene fusions function as drivers of cancer remains a critical open question. Current algorithms do not sufficiently identify false-positive fusions arising during library preparation, sequencing, and alignment. Here, we introduce Data-Enriched Efficient PrEcise STatistical fusion detection (DEEPEST), an algorithm that uses statistical modeling to minimize false-positives while increasing the sensitivity of fusion detection. In 9,946 tumor RNA-sequencing datasets from The Cancer Genome Atlas (TCGA) across 33 tumor types, DEEPEST identifies 31,007 fusions, 30% more than identified by other methods, while calling 10-fold fewer false-positive fusions in nontransformed human tissues. We leverage the increased precision of DEEPEST to discover fundamental cancer biology. Namely, 888 candidate oncogenes are identified based on overrepresentation in DEEPEST calls, and 1,078 previously unreported fusions involving long intergenic noncoding RNAs, demonstrating a previously unappreciated prevalence and potential for function. DEEPEST also reveals a high enrichment for fusions involving oncogenes in cancers, including ovarian cancer, which has had minimal treatment advances in recent decades, finding that more than 50% of tumors harbor gene fusions predicted to be oncogenic. Specific protein domains are enriched in DEEPEST calls, indicating a global selection for fusion functionality: kinase domains are nearly 2-fold more enriched in DEEPEST calls than expected by chance, as are domains involved in (anaerobic) metabolism and DNA binding. The statistical algorithms, population-level analytic framework, and the biological conclusions of DEEPEST call for increased attention to gene fusions as drivers of cancer and for future research into using fusions for targeted therapy.
基因融合在多大程度上作为癌症的驱动因素仍然是一个关键的悬而未决的问题。当前的算法不能充分识别在文库制备、测序和比对过程中产生的假阳性融合。在这里,我们引入了 Data-Enriched Efficient PrEcise STatistical fusion detection (DEEPEST),这是一种利用统计建模来最小化假阳性并提高融合检测灵敏度的算法。在来自癌症基因组图谱(TCGA)的 33 种肿瘤类型的 9946 个肿瘤 RNA 测序数据集上,DEEPEST 识别出 31007 个融合,比其他方法多 30%,而在非转化的人类组织中调用的假阳性融合则少 10 倍。我们利用 DEEPEST 的高精度来发现基本的癌症生物学。即,基于在 DEEPEST 调用中的过表达,鉴定了 888 个候选癌基因,并且鉴定了 1078 个以前未报道的涉及长非编码 RNA 的融合,证明了以前未被认识到的普遍性和潜在功能。DEEPEST 还揭示了癌症中涉及癌基因的融合的高富集性,包括卵巢癌,在过去几十年中,卵巢癌的治疗进展甚微,发现超过 50%的肿瘤携带被预测为致癌的基因融合。在 DEEPEST 调用中富集了特定的蛋白质结构域,表明融合具有全局选择的功能:激酶结构域在 DEEPEST 调用中的富集程度几乎是随机预期的两倍,而参与(厌氧)代谢和 DNA 结合的结构域也是如此。DEEPEST 的统计算法、群体分析框架和生物学结论呼吁增加对基因融合作为癌症驱动因素的关注,并呼吁进一步研究利用融合进行靶向治疗。