Guttman Mitchell, Mies Carolyn, Dudycz-Sulicz Katarzyna, Diskin Sharon J, Baldwin Don A, Stoeckert Christian J, Grant Gregory R
Penn Center for Bioinformatics, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America.
PLoS Genet. 2007 Aug;3(8):e143. doi: 10.1371/journal.pgen.0030143.
Genomic aberrations recurrent in a particular cancer type can be important prognostic markers for tumor progression. Typically in early tumorigenesis, cells incur a breakdown of the DNA replication machinery that results in an accumulation of genomic aberrations in the form of duplications, deletions, translocations, and other genomic alterations. Microarray methods allow for finer mapping of these aberrations than has previously been possible; however, data processing and analysis methods have not taken full advantage of this higher resolution. Attention has primarily been given to analysis on the single sample level, where multiple adjacent probes are necessarily used as replicates for the local region containing their target sequences. However, regions of concordant aberration can be short enough to be detected by only one, or very few, array elements. We describe a method called Multiple Sample Analysis for assessing the significance of concordant genomic aberrations across multiple experiments that does not require a-priori definition of aberration calls for each sample. If there are multiple samples, representing a class, then by exploiting the replication across samples our method can detect concordant aberrations at much higher resolution than can be derived from current single sample approaches. Additionally, this method provides a meaningful approach to addressing population-based questions such as determining important regions for a cancer subtype of interest or determining regions of copy number variation in a population. Multiple Sample Analysis also provides single sample aberration calls in the locations of significant concordance, producing high resolution calls per sample, in concordant regions. The approach is demonstrated on a dataset representing a challenging but important resource: breast tumors that have been formalin-fixed, paraffin-embedded, archived, and subsequently UV-laser capture microdissected and hybridized to two-channel BAC arrays using an amplification protocol. We demonstrate the accurate detection on simulated data, and on real datasets involving known regions of aberration within subtypes of breast cancer at a resolution consistent with that of the array. Similarly, we apply our method to previously published datasets, including a 250K SNP array, and verify known results as well as detect novel regions of concordant aberration. The algorithm has been fully implemented and tested and is freely available as a Java application at http://www.cbil.upenn.edu/MSA.
特定癌症类型中反复出现的基因组畸变可能是肿瘤进展的重要预后标志物。通常在肿瘤发生早期,细胞会出现DNA复制机制的故障,导致以重复、缺失、易位和其他基因组改变形式出现的基因组畸变积累。与以往相比,微阵列方法能够更精细地定位这些畸变;然而,数据处理和分析方法尚未充分利用这种更高的分辨率。目前主要关注的是单样本水平的分析,在该水平上,多个相邻探针必然被用作包含其靶序列的局部区域的重复样本。然而,一致畸变区域可能短到仅能被一个或极少数阵列元件检测到。我们描述了一种称为多样本分析的方法,用于评估多个实验中一致基因组畸变的显著性,该方法不需要对每个样本的畸变调用进行先验定义。如果有多个代表某一类别的样本,那么通过利用样本间的重复性,我们的方法能够以比当前单样本方法更高的分辨率检测一致畸变。此外,该方法为解决基于群体的问题提供了一种有意义的途径,例如确定感兴趣的癌症亚型的重要区域或确定群体中的拷贝数变异区域。多样本分析还能在显著一致的位置给出单样本畸变调用,在一致区域为每个样本生成高分辨率调用。该方法在一个具有挑战性但重要的数据集上得到了验证:福尔马林固定、石蜡包埋、存档的乳腺肿瘤,随后进行紫外激光捕获显微切割,并使用扩增方案与双通道BAC阵列杂交。我们在模拟数据以及涉及乳腺癌亚型内已知畸变区域的真实数据集上,以与阵列一致的分辨率证明了该方法的准确检测能力。同样,我们将我们的方法应用于先前发表的数据集,包括一个250K SNP阵列,并验证了已知结果以及检测到新的一致畸变区域。该算法已完全实现并经过测试,可作为Java应用程序在http://www.cbil.upenn.edu/MSA免费获取。