Division of Statistical Genomics, Washington University School of Medicine, St Louis, MO, USA.
Bioinformatics. 2010 Feb 15;26(4):464-9. doi: 10.1093/bioinformatics/btp708. Epub 2009 Dec 23.
MOTIVATION: DNA copy number aberration (CNA) is a hallmark of genomic abnormality in tumor cells. Recurrent CNA (RCNA) occurs in multiple cancer samples across the same chromosomal region and has greater implication in tumorigenesis. Current commonly used methods for RCNA identification require CNA calling for individual samples before cross-sample analysis. This two-step strategy may result in a heavy computational burden, as well as a loss of the overall statistical power due to segmentation and discretization of individual sample's data. We propose a population-based approach for RCNA detection with no need of single-sample analysis, which is statistically powerful, computationally efficient and particularly suitable for high-resolution and large-population studies. RESULTS: Our approach, correlation matrix diagonal segmentation (CMDS), identifies RCNAs based on a between-chromosomal-site correlation analysis. Directly using the raw intensity ratio data from all samples and adopting a diagonal transformation strategy, CMDS substantially reduces computational burden and can obtain results very quickly from large datasets. Our simulation indicates that the statistical power of CMDS is higher than that of single-sample CNA calling based two-step approaches. We applied CMDS to two real datasets of lung cancer and brain cancer from Affymetrix and Illumina array platforms, respectively, and successfully identified known regions of CNA associated with EGFR, KRAS and other important oncogenes. CMDS provides a fast, powerful and easily implemented tool for the RCNA analysis of large-scale data from cancer genomes.
动机:DNA 拷贝数异常(CNA)是肿瘤细胞基因组异常的标志。在同一染色体区域的多个癌症样本中发生的反复 CNA(RCNA)在肿瘤发生中具有更大的意义。目前用于 RCNA 识别的常用方法需要在跨样本分析之前对单个样本进行 CNA 调用。这种两步策略可能会导致计算负担沉重,并且由于个体样本数据的分割和离散化,总体统计能力丧失。我们提出了一种基于群体的 RCNA 检测方法,无需进行单样本分析,该方法具有统计学上的强大性、计算效率高,特别适用于高分辨率和大群体研究。
结果:我们的方法,相关矩阵对角线分割(CMDS),基于染色体间位点的相关分析来识别 RCNAs。CMDS 直接使用所有样本的原始强度比数据,并采用对角线转换策略,大大降低了计算负担,并且可以从大型数据集快速获得结果。我们的模拟表明,CMDS 的统计功效高于基于两步法的单样本 CNA 调用。我们将 CMDS 应用于 Affymetrix 和 Illumina 阵列平台的两个肺癌和脑癌的真实数据集,成功识别了与 EGFR、KRAS 和其他重要癌基因相关的已知 CNA 区域。CMDS 为癌症基因组的大规模数据的 RCNA 分析提供了一种快速、强大且易于实现的工具。
IEEE/ACM Trans Comput Biol Bioinform. 2016
BMC Genomics. 2012-7-27
Bioinformatics. 2009-7-1
BMC Bioinformatics. 2009-6-29
Brief Bioinform. 2021-11-5
Front Genet. 2021-4-30
Methods Mol Biol. 2021
BMC Bioinformatics. 2017-7-11
BMC Bioinformatics. 2016-11-3
Database (Oxford). 2015-10-8
BMC Bioinformatics. 2009-9-23
Bioinformatics. 2009-4-15
Nucleic Acids Res. 2008-11
Proc Natl Acad Sci U S A. 2007-12-11
Nature. 2007-12-6
Bioinformatics. 2007-7-1