Zhu Kaiyuan, Jones Matthew G, Luebeck Jens, Bu Xinxin, Yi Hyerim, Hung King L, Wong Ivy Tsz-Lo, Zhang Shu, Mischel Paul S, Chang Howard Y, Bafna Vineet
Department of Computer Science & Engineering, UC San Diego, La Jolla, CA, USA.
These authors contributed equally to this work.
bioRxiv. 2024 May 18:2024.02.15.580594. doi: 10.1101/2024.02.15.580594.
Extrachromosomal DNA (ecDNA) is a central mechanism for focal oncogene amplification in cancer, occurring in approximately 15% of early stage cancers and 30% of late-stage cancers. EcDNAs drive tumor formation, evolution, and drug resistance by dynamically modulating oncogene copy-number and rewiring gene-regulatory networks. Elucidating the genomic architecture of ecDNA amplifications is critical for understanding tumor pathology and developing more effective therapies. Paired-end short-read (Illumina) sequencing and mapping have been utilized to represent ecDNA amplifications using a breakpoint graph, where the inferred architecture of ecDNA is encoded as a cycle in the graph. Traversals of breakpoint graph have been used to successfully predict ecDNA presence in cancer samples. However, short-read technologies are intrinsically limited in the identification of breakpoints, phasing together of complex rearrangements and internal duplications, and deconvolution of cell-to-cell heterogeneity of ecDNA structures. Long-read technologies, such as from Oxford Nanopore Technologies, have the potential to improve inference as the longer reads are better at mapping structural variants and are more likely to span rearranged or duplicated regions. Here, we propose CoRAL (Complete Reconstruction of Amplifications with Long reads), for reconstructing ecDNA architectures using long-read data. CoRAL reconstructs likely cyclic architectures using quadratic programming that simultaneously optimizes parsimony of reconstruction, explained copy number, and consistency of long-read mapping. CoRAL substantially improves reconstructions in extensive simulations and 9 datasets from previously-characterized cell-lines as compared to previous short-read-based tools. As long-read usage becomes wide-spread, we anticipate that CoRAL will be a valuable tool for profiling the landscape and evolution of focal amplifications in tumors.
染色体外DNA(ecDNA)是癌症中局部癌基因扩增的核心机制,约15%的早期癌症和30%的晚期癌症中会出现。ecDNA通过动态调节癌基因拷贝数和重塑基因调控网络来驱动肿瘤形成、进化和耐药性。阐明ecDNA扩增的基因组结构对于理解肿瘤病理学和开发更有效的治疗方法至关重要。配对末端短读长(Illumina)测序和映射已被用于使用断点图来表示ecDNA扩增,其中ecDNA的推断结构被编码为图中的一个环。断点图的遍历已被用于成功预测癌症样本中ecDNA的存在。然而,短读长技术在断点识别、复杂重排和内部重复的相位拼接以及ecDNA结构的细胞间异质性解卷积方面存在内在局限性。长读长技术,如牛津纳米孔技术公司的技术,有可能改进推断,因为更长的读长在映射结构变异方面表现更好,并且更有可能跨越重排或重复区域。在这里,我们提出了CoRAL(使用长读长进行扩增的完整重建),用于使用长读长数据重建ecDNA结构。CoRAL使用二次规划重建可能的环状结构,该规划同时优化重建的简约性、解释的拷贝数和长读长映射的一致性。与以前基于短读长的工具相比,CoRAL在广泛的模拟和来自先前表征的细胞系的9个数据集中显著改进了重建。随着长读长的广泛使用,我们预计CoRAL将成为描绘肿瘤中局部扩增的格局和进化的有价值工具。