Department of Computer Science, Barcelona Supercomputing Center (BSC), 08034, Barcelona, Spain.
Department of Life Science, Barcelona Supercomputing Center (BSC), 08034, Barcelona, Spain.
Sci Rep. 2022 Feb 28;12(1):3244. doi: 10.1038/s41598-022-07211-6.
For many years, a major question in cancer genomics has been the identification of those variations that can have a functional role in cancer, and distinguish from the majority of genomic changes that have no functional consequences. This is particularly challenging when considering complex chromosomal rearrangements, often composed of multiple DNA breaks, resulting in difficulties in classifying and interpreting them functionally. Despite recent efforts towards classifying structural variants (SVs), more robust statistical frames are needed to better classify these variants and isolate those that derive from specific molecular mechanisms. We present a new statistical approach to analyze SVs patterns from 2392 tumor samples from the Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium and identify significant recurrence, which can inform relevant mechanisms involved in the biology of tumors. The method is based on recursive KDE clustering of 152,926 SVs, randomization methods, graph mining techniques and statistical measures. The proposed methodology was able not only to identify complex patterns across different cancer types but also to prove them as not random occurrences. Furthermore, a new class of pattern that was not previously described has been identified.
多年来,癌症基因组学的一个主要问题是确定那些在癌症中具有功能作用的变异,并将其与大多数没有功能后果的基因组变化区分开来。当考虑复杂的染色体重排时,这尤其具有挑战性,因为它们通常由多个 DNA 断裂组成,导致在功能上对其进行分类和解释变得困难。尽管最近在对结构变异(SV)进行分类方面做出了努力,但仍需要更强大的统计框架来更好地对这些变体进行分类,并分离出那些源自特定分子机制的变体。我们提出了一种新的统计方法来分析来自癌症全基因组分析(PCAWG)联盟的 2392 个肿瘤样本中的 SV 模式,并确定了显著的重现性,这可以为肿瘤生物学中涉及的相关机制提供信息。该方法基于对 152926 个 SV 的递归 KDE 聚类、随机化方法、图挖掘技术和统计措施。所提出的方法不仅能够识别不同癌症类型中的复杂模式,而且还能够证明它们不是随机发生的。此外,还发现了一种以前未描述的新类型的模式。