在非参数贝叶斯框架下通过综合基因组学方法发现癌症驱动基因。

Cancer driver gene discovery through an integrative genomics approach in a non-parametric Bayesian framework.

作者信息

Yang Hai, Wei Qiang, Zhong Xue, Yang Hushan, Li Bingshan

机构信息

Department of Molecular Physiology & Biophysics, Vanderbilt University, Nashville, TN, USA.

Vanderbilt Genetics Institute, Nashville, TN, USA.

出版信息

Bioinformatics. 2017 Feb 15;33(4):483-490. doi: 10.1093/bioinformatics/btw662.

DOI:10.1093/bioinformatics/btw662

PMID:27797769

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6075201/

Abstract

MOTIVATION

Comprehensive catalogue of genes that drive tumor initiation and progression in cancer is key to advancing diagnostics, therapeutics and treatment. Given the complexity of cancer, the catalogue is far from complete yet. Increasing evidence shows that driver genes exhibit consistent aberration patterns across multiple-omics in tumors. In this study, we aim to leverage complementary information encoded in each of the omics data to identify novel driver genes through an integrative framework. Specifically, we integrated mutations, gene expression, DNA copy numbers, DNA methylation and protein abundance, all available in The Cancer Genome Atlas (TCGA) and developed iDriver, a non-parametric Bayesian framework based on multivariate statistical modeling to identify driver genes in an unsupervised fashion. iDriver captures the inherent clusters of gene aberrations and constructs the background distribution that is used to assess and calibrate the confidence of driver genes identified through multi-dimensional genomic data.

RESULTS

We applied the method to 4 cancer types in TCGA and identified candidate driver genes that are highly enriched with known drivers. (e.g.: P < 3.40 × 10 -36 for breast cancer). We are particularly interested in novel genes and observed multiple lines of supporting evidence. Using systematic evaluation from multiple independent aspects, we identified 45 candidate driver genes that were not previously known across these 4 cancer types. The finding has important implications that integrating additional genomic data with multivariate statistics can help identify cancer drivers and guide the next stage of cancer genomics research.

AVAILABILITY AND IMPLEMENTATION

The C ++ source code is freely available at https://medschool.vanderbilt.edu/cgg/ .

CONTACTS

hai.yang@vanderbilt.edu or bingshan.li@Vanderbilt.Edu.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

全面列出驱动癌症发生和发展的基因目录是推进癌症诊断、治疗和疗法的关键。鉴于癌症的复杂性，该目录目前还远未完整。越来越多的证据表明，驱动基因在肿瘤的多种组学中呈现出一致的畸变模式。在本研究中，我们旨在利用每个组学数据中编码的互补信息，通过一个整合框架来识别新的驱动基因。具体而言，我们整合了《癌症基因组图谱》（TCGA）中所有可用的突变、基因表达、DNA拷贝数、DNA甲基化和蛋白质丰度数据，并开发了iDriver，这是一个基于多变量统计建模的非参数贝叶斯框架，用于以无监督方式识别驱动基因。iDriver捕捉基因畸变的固有聚类，并构建背景分布，用于评估和校准通过多维基因组数据识别出的驱动基因的可信度。

结果

我们将该方法应用于TCGA中的4种癌症类型，识别出了高度富集已知驱动基因的候选驱动基因（例如：乳腺癌的P < 3.40×10-36）。我们对新基因特别感兴趣，并观察到了多条支持证据。通过从多个独立方面进行系统评估，我们在这4种癌症类型中识别出了45个以前未知的候选驱动基因。这一发现具有重要意义，即整合额外的基因组数据和多变量统计可以帮助识别癌症驱动基因，并指导癌症基因组学研究的下一阶段。

可用性与实施

C++ 源代码可在https://medschool.vanderbilt.edu/cgg/ 免费获取。

联系方式

hai.yang@vanderbilt.edu 或 bingshan.li@Vanderbilt.Edu。

补充信息

补充数据可在《生物信息学》在线获取。

相似文献

Cancer driver gene discovery through an integrative genomics approach in a non-parametric Bayesian framework.在非参数贝叶斯框架下通过综合基因组学方法发现癌症驱动基因。

Bioinformatics. 2017 Feb 15;33(4):483-490. doi: 10.1093/bioinformatics/btw662.

Identification of candidate cancer drivers by integrative Epi-DNA and Gene Expression (iEDGE) data analysis.通过整合表观遗传学 DNA 和基因表达（iEDGE）数据分析鉴定候选癌症驱动基因。

Sci Rep. 2019 Nov 15;9(1):16904. doi: 10.1038/s41598-019-52886-z.

Driver gene detection through Bayesian network integration of mutation and expression profiles.通过突变和表达谱的贝叶斯网络集成进行驱动基因检测。

Bioinformatics. 2022 May 13;38(10):2781-2790. doi: 10.1093/bioinformatics/btac203.

Network-based integration of multi-omics data for prioritizing cancer genes.基于网络的多组学数据整合用于优先考虑癌症基因。

Bioinformatics. 2018 Jul 15;34(14):2441-2448. doi: 10.1093/bioinformatics/bty148.

Modeling gene-wise dependencies improves the identification of drug response biomarkers in cancer studies.对基因层面的依赖性进行建模可改善癌症研究中药物反应生物标志物的识别。

Bioinformatics. 2017 May 1;33(9):1362-1369. doi: 10.1093/bioinformatics/btw836.

The Integrative Method Based on the Module-Network for Identifying Driver Genes in Cancer Subtypes.基于模块网络的癌症亚型驱动基因识别的综合方法。

Molecules. 2018 Jan 24;23(2):183. doi: 10.3390/molecules23020183.

A Bayesian framework for de novo mutation calling in parents-offspring trios.一种用于亲子三人组中新生突变检测的贝叶斯框架。

Bioinformatics. 2015 May 1;31(9):1375-81. doi: 10.1093/bioinformatics/btu839. Epub 2014 Dec 21.

DEOD: uncovering dominant effects of cancer-driver genes based on a partial covariance selection method.DEOD：基于部分协方差选择方法揭示癌症驱动基因的显性效应

Bioinformatics. 2015 Aug 1;31(15):2452-60. doi: 10.1093/bioinformatics/btv175. Epub 2015 Mar 26.

Integration of somatic mutation, expression and functional data reveals potential driver genes predictive of breast cancer survival.体细胞突变、表达和功能数据的整合揭示了预测乳腺癌生存的潜在驱动基因。

Bioinformatics. 2015 Aug 15;31(16):2607-13. doi: 10.1093/bioinformatics/btv164. Epub 2015 Mar 24.

Discovering personalized driver mutation profiles of single samples in cancer by network control strategy.通过网络控制策略发现癌症中单样本的个性化驱动突变特征。

Bioinformatics. 2018 Jun 1;34(11):1893-1903. doi: 10.1093/bioinformatics/bty006.

引用本文的文献

A clustering approach to integrative analyses of multiomic cancer data.一种用于多组学癌症数据综合分析的聚类方法。

J Appl Stat. 2024 Nov 29;52(8):1539-1560. doi: 10.1080/02664763.2024.2431742. eCollection 2025.

Familial CCM Genes Might Not Be Main Drivers for Pathogenesis of Sporadic CCMs-Genetic Similarity between Cancers and Vascular Malformations.家族性脑静脉畸形基因可能不是散发性脑静脉畸形发病机制的主要驱动因素——癌症与血管畸形之间的遗传相似性。

J Pers Med. 2023 Apr 17;13(4):673. doi: 10.3390/jpm13040673.

Dynamic cancer drivers: a causal approach for cancer driver discovery based on bio-pathological trajectories.动态癌症驱动因子：基于生物病理轨迹的癌症驱动因子发现的因果方法。

Brief Funct Genomics. 2022 Nov 17;21(6):455-465. doi: 10.1093/bfgp/elac030.

Identification of gene variants and the risk of hepatic vein thrombosis in Saudi patients.鉴定基因变异与沙特患者肝静脉血栓形成的风险。

Saudi Med J. 2021 Sep;42(9):969-974. doi: 10.15537/smj.2021.42.9.20210240.

Ranking cancer drivers via betweenness-based outlier detection and random walks.基于介数的异常点检测和随机游走算法对癌症驱动基因进行排名。

BMC Bioinformatics. 2021 Feb 10;22(1):62. doi: 10.1186/s12859-021-03989-w.

DriveWays: a method for identifying possibly overlapping driver pathways in cancer.驱动通路分析：一种鉴定癌症中可能存在重叠驱动通路的方法。

Sci Rep. 2020 Dec 15;10(1):21971. doi: 10.1038/s41598-020-78852-8.

Machine learning-based genome-wide interrogation of somatic copy number aberrations in circulating tumor DNA for early detection of hepatocellular carcinoma.基于机器学习的循环肿瘤 DNA 中体细胞拷贝数异常的全基因组检测用于肝细胞癌的早期检测。

EBioMedicine. 2020 Jun;56:102811. doi: 10.1016/j.ebiom.2020.102811. Epub 2020 Jun 5.

An Effective Graph Clustering Method to Identify Cancer Driver Modules.一种用于识别癌症驱动模块的有效图聚类方法。

Front Bioeng Biotechnol. 2020 Apr 7;8:271. doi: 10.3389/fbioe.2020.00271. eCollection 2020.

Systematic discovery of the functional impact of somatic genome alterations in individual tumors through tumor-specific causal inference.通过肿瘤特异性因果推断，在个体肿瘤中系统地发现体细胞基因组改变的功能影响。

PLoS Comput Biol. 2019 Jul 5;15(7):e1007088. doi: 10.1371/journal.pcbi.1007088. eCollection 2019 Jul.

De novo pattern discovery enables robust assessment of functional consequences of non-coding variants.从头发现模式可实现对非编码变异功能后果的稳健评估。

Bioinformatics. 2019 May 1;35(9):1453-1460. doi: 10.1093/bioinformatics/bty826.

本文引用的文献

Parallel genome-scale loss of function screens in 216 cancer cell lines for the identification of context-specific genetic dependencies.在 216 种癌细胞系中进行平行的全基因组功能丧失筛选，以鉴定特定于上下文的遗传依赖性。

Sci Data. 2014 Sep 30;1:140035. doi: 10.1038/sdata.2014.35. eCollection 2014.

A framework for the interpretation of de novo mutation in human disease.一种人类疾病中新生突变的解读框架。

Nat Genet. 2014 Sep;46(9):944-50. doi: 10.1038/ng.3050. Epub 2014 Aug 3.

Studying tumorigenesis through network evolution and somatic mutational perturbations in the cancer interactome.通过癌症相互作用组中的网络进化和体细胞突变扰动来研究肿瘤发生。

Mol Biol Evol. 2014 Aug;31(8):2156-69. doi: 10.1093/molbev/msu167. Epub 2014 May 31.

Whole-genome sequencing and comprehensive molecular profiling identify new driver mutations in gastric cancer.全基因组测序和全面分子谱分析鉴定胃癌中的新驱动突变。

Nat Genet. 2014 Jun;46(6):573-82. doi: 10.1038/ng.2983. Epub 2014 May 11.

Discovery and saturation analysis of cancer genes across 21 tumour types.在 21 种肿瘤类型中发现和饱和分析癌症基因。

Nature. 2014 Jan 23;505(7484):495-501. doi: 10.1038/nature12912. Epub 2014 Jan 5.

TPX2 is a novel prognostic marker for the growth and metastasis of colon cancer.TPX2是结肠癌生长和转移的一种新型预后标志物。

J Transl Med. 2013 Dec 17;11:313. doi: 10.1186/1479-5876-11-313.

Mutational landscape and significance across 12 major cancer types.12 种主要癌症类型的突变特征及意义。

Nature. 2013 Oct 17;502(7471):333-339. doi: 10.1038/nature12634.

Pan-cancer patterns of somatic copy number alteration.体细胞拷贝数改变的泛癌模式

Nat Genet. 2013 Oct;45(10):1134-40. doi: 10.1038/ng.2760.

Mutational heterogeneity in cancer and the search for new cancer-associated genes.癌症中的突变异质性与新的癌症相关基因的寻找。

Nature. 2013 Jul 11;499(7457):214-218. doi: 10.1038/nature12213. Epub 2013 Jun 16.

Target inference from collections of genomic intervals.从基因组区间集合中进行目标推断。

Proc Natl Acad Sci U S A. 2013 Jun 18;110(25):E2271-8. doi: 10.1073/pnas.1306909110. Epub 2013 Jun 6.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验