Bradley Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, VA 22203, USA.
Bioinformatics. 2011 Jun 1;27(11):1473-80. doi: 10.1093/bioinformatics/btr183. Epub 2011 Apr 15.
Identification of somatic DNA copy number alterations (CNAs) and significant consensus events (SCEs) in cancer genomes is a main task in discovering potential cancer-driving genes such as oncogenes and tumor suppressors. The recent development of SNP array technology has facilitated studies on copy number changes at a genome-wide scale with high resolution. However, existing copy number analysis methods are oblivious to normal cell contamination and cannot distinguish between contributions of cancerous and normal cells to the measured copy number signals. This contamination could significantly confound downstream analysis of CNAs and affect the power to detect SCEs in clinical samples.
We report here a statistically principled in silico approach, Bayesian Analysis of COpy number Mixtures (BACOM), to accurately estimate genomic deletion type and normal tissue contamination, and accordingly recover the true copy number profile in cancer cells. We tested the proposed method on two simulated datasets, two prostate cancer datasets and The Cancer Genome Atlas high-grade ovarian dataset, and obtained very promising results supported by the ground truth and biological plausibility. Moreover, based on a large number of comparative simulation studies, the proposed method gives significantly improved power to detect SCEs after in silico correction of normal tissue contamination. We develop a cross-platform open-source Java application that implements the whole pipeline of copy number analysis of heterogeneous cancer tissues including relevant processing steps. We also provide an R interface, bacomR, for running BACOM within the R environment, making it straightforward to include in existing data pipelines.
The cross-platform, stand-alone Java application, BACOM, the R interface, bacomR, all source code and the simulation data used in this article are freely available at authors' web site: http://www.cbil.ece.vt.edu/software.htm.
在癌症基因组中识别体细胞 DNA 拷贝数改变 (CNAs) 和显著一致事件 (SCEs) 是发现潜在致癌基因(如癌基因和肿瘤抑制基因)的主要任务。SNP 阵列技术的最新发展促进了在全基因组范围内进行高分辨率拷贝数变化的研究。然而,现有的拷贝数分析方法忽略了正常细胞的污染,无法区分癌症细胞和正常细胞对测量拷贝数信号的贡献。这种污染会极大地混淆 CNA 的下游分析,并影响在临床样本中检测 SCE 的能力。
我们在这里报告了一种基于统计学原理的计算方法,即贝叶斯分析拷贝数混合物 (BACOM),该方法可以准确估计基因组缺失类型和正常组织污染,并相应地恢复癌细胞中的真实拷贝数谱。我们在两个模拟数据集、两个前列腺癌数据集和癌症基因组图谱高级卵巢数据集上测试了所提出的方法,并得到了非常有前途的结果,这些结果得到了真实数据和生物学合理性的支持。此外,基于大量的比较模拟研究,该方法在对正常组织污染进行计算校正后,显著提高了检测 SCE 的能力。我们开发了一个跨平台的开源 Java 应用程序,该应用程序实现了包括相关处理步骤在内的异质癌症组织拷贝数分析的整个流程。我们还提供了一个 R 接口 bacomR,用于在 R 环境中运行 BACOM,使得它可以很容易地包含在现有的数据管道中。
跨平台、独立的 Java 应用程序 BACOM、R 接口 bacomR、本文使用的所有源代码和模拟数据都可以在作者的网站上免费获得:http://www.cbil.ece.vt.edu/software.htm。