School of Computing Science, Simon Fraser University, Vancouver, British Columbia V5A 1S6, Canada.
Genome Res. 2012 Nov;22(11):2250-61. doi: 10.1101/gr.136572.111. Epub 2012 Jun 28.
Complex genomic rearrangements (CGRs) are emerging as a new feature of cancer genomes. CGRs are characterized by multiple genomic breakpoints and thus have the potential to simultaneously affect multiple genes, fusing some genes and interrupting other genes. Analysis of high-throughput whole-genome shotgun sequencing (WGSS) is beginning to facilitate the discovery and characterization of CGRs, but further development of computational methods is required. We have developed an algorithmic method for identifying CGRs in WGSS data based on shortest alternating paths in breakpoint graphs. Aiming for a method with the highest possible sensitivity, we use breakpoint graphs built from all WGSS data, including sequences with ambiguous genomic origin. Since the majority of cell function is encoded by the transcriptome, we target our search to find CGRs that underlie fusion transcripts predicted from matched high-throughput cDNA sequencing (RNA-seq). We have applied our method, nFuse, to the discovery of CGRs in publicly available data from the well-studied breast cancer cell line HCC1954 and primary prostate tumor sample 963. We first establish the sensitivity and specificity of the nFuse breakpoint prediction and scoring method using breakpoints previously discovered in HCC1954. We then validate five out of six CGRs in HCC1954 and two out of two CGRs in 963. We show examples of gene fusions that would be difficult to discover using methods that do not account for the existence of CGRs, including one important event that was missed in a previous study of the HCC1954 genome. Finally, we illustrate how CGRs may be used to infer the gene expression history of a tumor.
复杂基因组重排(CGRs)正成为癌症基因组的一个新特征。CGRs 的特征是多个基因组断点,因此有可能同时影响多个基因,融合一些基因并中断其他基因。高通量全基因组鸟枪法测序(WGSS)的分析开始促进 CGRs 的发现和特征描述,但需要进一步开发计算方法。我们已经开发了一种基于断点图中最短交替路径的算法,用于识别 WGSS 数据中的 CGRs。为了达到尽可能高的灵敏度,我们使用来自所有 WGSS 数据的断点图构建,包括具有模糊基因组起源的序列。由于细胞功能的大部分是由转录组编码的,我们的目标是寻找预测融合转录本的 CGRs,这些转录本来自匹配的高通量 cDNA 测序(RNA-seq)。我们已经将我们的方法 nFuse 应用于从研究充分的乳腺癌细胞系 HCC1954 和原发性前列腺肿瘤样本 963 中公开可用的数据中发现 CGRs。我们首先使用先前在 HCC1954 中发现的断点来建立 nFuse 断点预测和评分方法的灵敏度和特异性。然后,我们验证了 HCC1954 中的 5 个 CGRs 和 963 中的 2 个 CGRs。我们展示了一些难以使用不考虑 CGRs 存在的方法发现的基因融合的例子,包括在 HCC1954 基因组的先前研究中错过的一个重要事件。最后,我们说明了 CGRs 如何用于推断肿瘤的基因表达历史。