Gjoni Ketrin, Zhang Shu, Yan Rachel E, Zhang Bo, Miller Daniel, Resnick Adam, Dahmane Nadia, Pollard Katherine S
bioRxiv. 2025 Apr 2:2025.03.28.645984. doi: 10.1101/2025.03.28.645984.
Structural variants (SVs) are increasingly recognized as important contributors to oncogenesis through their effects on 3D genome folding. Recent advances in whole-genome sequencing have enabled large-scale profiling of SVs across diverse tumors, yet experimental characterization of their individual impact on genome folding remains infeasible. Here, we leveraged a convolutional neural network, Akita, to predict disruptions in genome folding caused by somatic SVs identified in 61 tumor types from the Children's Brain Tumor Network dataset. Our analysis reveals significant variability in SV-induced disruptions across tumor types, with the most disruptive SVs coming from lymphomas and sarcomas, metastatic tumors, and germline cell tumors. Dimensionality reduction of disruption scores identified five recurrently disrupted regions enriched for high-impact SVs across multiple tumors. Some of these regions are highly disrupted despite not being highly mutated, and harbor tumor-associated genes and transcriptional regulators. To further interpret the functional relevance of high-scoring SVs, we integrated epigenetic data and developed a modified Activity-by-Contact scoring approach to prioritize SVs with disrupted genome contacts at active enhancers. This method highlighted highly disruptive SVs near key oncogenes, as well as novel candidate loci potentially implicated in tumorigenesis. These findings highlight the utility of machine learning for identifying novel SVs, loci, and genetic mechanisms contributing to pediatric cancers. This framework provides a foundation for future studies linking SV-driven regulatory changes to cancer pathogenesis.
结构变异(SVs)因其对三维基因组折叠的影响,越来越被认为是肿瘤发生的重要因素。全基因组测序的最新进展使得对不同肿瘤中的SVs进行大规模分析成为可能,但对其对基因组折叠的个体影响进行实验表征仍然不可行。在这里,我们利用卷积神经网络Akita,来预测在儿童脑肿瘤网络数据集中61种肿瘤类型中鉴定出的体细胞SVs所导致的基因组折叠破坏。我们的分析揭示了不同肿瘤类型中SVs诱导的破坏存在显著差异,最具破坏性的SVs来自淋巴瘤和肉瘤、转移性肿瘤以及生殖细胞肿瘤。破坏得分的降维分析确定了五个反复被破坏的区域,这些区域在多种肿瘤中富含高影响的SVs。其中一些区域尽管没有高度突变,但却高度被破坏,并且含有肿瘤相关基因和转录调节因子。为了进一步解释高分SVs的功能相关性,我们整合了表观遗传数据,并开发了一种改进的基于接触的活性评分方法,以对在活性增强子处基因组接触被破坏的SVs进行优先级排序。该方法突出了关键癌基因附近具有高度破坏性的SVs,以及可能与肿瘤发生有关的新候选基因座。这些发现突出了机器学习在识别导致儿童癌症的新型SVs、基因座和遗传机制方面的实用性。这个框架为未来将SV驱动的调控变化与癌症发病机制联系起来的研究奠定了基础。