聚类和图挖掘技术在癌症基因组复杂结构变异分类中的应用。

Clustering and graph mining techniques for classification of complex structural variations in cancer genomes.

机构信息

Department of Computer Science, Barcelona Supercomputing Center (BSC), 08034, Barcelona, Spain.

Department of Life Science, Barcelona Supercomputing Center (BSC), 08034, Barcelona, Spain.

出版信息

Sci Rep. 2022 Feb 28;12(1):3244. doi: 10.1038/s41598-022-07211-6.

DOI:10.1038/s41598-022-07211-6

PMID:35228601

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8885672/

Abstract

For many years, a major question in cancer genomics has been the identification of those variations that can have a functional role in cancer, and distinguish from the majority of genomic changes that have no functional consequences. This is particularly challenging when considering complex chromosomal rearrangements, often composed of multiple DNA breaks, resulting in difficulties in classifying and interpreting them functionally. Despite recent efforts towards classifying structural variants (SVs), more robust statistical frames are needed to better classify these variants and isolate those that derive from specific molecular mechanisms. We present a new statistical approach to analyze SVs patterns from 2392 tumor samples from the Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium and identify significant recurrence, which can inform relevant mechanisms involved in the biology of tumors. The method is based on recursive KDE clustering of 152,926 SVs, randomization methods, graph mining techniques and statistical measures. The proposed methodology was able not only to identify complex patterns across different cancer types but also to prove them as not random occurrences. Furthermore, a new class of pattern that was not previously described has been identified.

摘要

多年来，癌症基因组学的一个主要问题是确定那些在癌症中具有功能作用的变异，并将其与大多数没有功能后果的基因组变化区分开来。当考虑复杂的染色体重排时，这尤其具有挑战性，因为它们通常由多个 DNA 断裂组成，导致在功能上对其进行分类和解释变得困难。尽管最近在对结构变异（SV）进行分类方面做出了努力，但仍需要更强大的统计框架来更好地对这些变体进行分类，并分离出那些源自特定分子机制的变体。我们提出了一种新的统计方法来分析来自癌症全基因组分析（PCAWG）联盟的 2392 个肿瘤样本中的 SV 模式，并确定了显著的重现性，这可以为肿瘤生物学中涉及的相关机制提供信息。该方法基于对 152926 个 SV 的递归 KDE 聚类、随机化方法、图挖掘技术和统计措施。所提出的方法不仅能够识别不同癌症类型中的复杂模式，而且还能够证明它们不是随机发生的。此外，还发现了一种以前未描述的新类型的模式。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d059/8885672/85c21ba6f315/41598_2022_7211_Fig1_HTML.jpg

相似文献

Clustering and graph mining techniques for classification of complex structural variations in cancer genomes.聚类和图挖掘技术在癌症基因组复杂结构变异分类中的应用。

Sci Rep. 2022 Feb 28;12(1):3244. doi: 10.1038/s41598-022-07211-6.

Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区，服用抗叶酸抗疟药物的人群中，叶酸补充剂与疟疾易感性和严重程度的关系。

Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.

Structural Variation in Cancer: Role, Prevalence, and Mechanisms.癌症中的结构变异：作用、发生率及机制

Annu Rev Genomics Hum Genet. 2022 Aug 31;23:123-152. doi: 10.1146/annurev-genom-120121-101149. Epub 2022 Jun 2.

Pan-cancer analysis of whole genomes.泛癌症全基因组分析。

Nature. 2020 Feb;578(7793):82-93. doi: 10.1038/s41586-020-1969-6. Epub 2020 Feb 5.

svclassify: a method to establish benchmark structural variant calls.svclassify：一种建立基准结构变异调用的方法。

BMC Genomics. 2016 Jan 16;17:64. doi: 10.1186/s12864-016-2366-2.

Patterns of somatic structural variation in human cancer genomes.人类癌症基因组中体结构变异的模式。

Nature. 2020 Feb;578(7793):112-121. doi: 10.1038/s41586-019-1913-9. Epub 2020 Feb 5.

Disruption of chromatin folding domains by somatic genomic rearrangements in human cancer.染色质折叠域被体细胞基因组重排破坏与人类癌症相关。

Nat Genet. 2020 Mar;52(3):294-305. doi: 10.1038/s41588-019-0564-y. Epub 2020 Feb 5.

Toward Recovering Allele-specific Cancer Genome Graphs.迈向恢复等位基因特异性癌症基因组图谱。

J Comput Biol. 2018 Jul;25(7):624-636. doi: 10.1089/cmb.2018.0022. Epub 2018 Apr 16.

Structural variations in cancer and the 3D genome.癌症中的结构变异与 3D 基因组。

Nat Rev Cancer. 2022 Sep;22(9):533-546. doi: 10.1038/s41568-022-00488-9. Epub 2022 Jun 28.

Genomic basis for RNA alterations in cancer.癌症中 RNA 改变的基因组基础。

Nature. 2020 Feb;578(7793):129-136. doi: 10.1038/s41586-020-1970-0. Epub 2020 Feb 5.

引用本文的文献

Robust self supervised symmetric nonnegative matrix factorization to the graph clustering.用于图聚类的鲁棒自监督对称非负矩阵分解

Sci Rep. 2025 Mar 1;15(1):7350. doi: 10.1038/s41598-025-92564-x.

本文引用的文献

MulticlusterKDE: a new algorithm for clustering based on multivariate kernel density estimation.多聚类核密度估计：一种基于多元核密度估计的聚类新算法。

J Appl Stat. 2020 Jul 30;49(1):98-121. doi: 10.1080/02664763.2020.1799958. eCollection 2022.

Patterns of somatic structural variation in human cancer genomes.人类癌症基因组中体结构变异的模式。

Nature. 2020 Feb;578(7793):112-121. doi: 10.1038/s41586-019-1913-9. Epub 2020 Feb 5.

Pan-cancer analysis of whole genomes.泛癌症全基因组分析。

Nature. 2020 Feb;578(7793):82-93. doi: 10.1038/s41586-020-1969-6. Epub 2020 Feb 5.

Comprehensive analysis of chromothripsis in 2,658 human cancers using whole-genome sequencing.利用全基因组测序技术对 2658 个人类癌症中的染色体重排进行全面分析。

Nat Genet. 2020 Mar;52(3):331-341. doi: 10.1038/s41588-019-0576-7. Epub 2020 Feb 5.

HumCFS: a database of fragile sites in human chromosomes.HumCFS：人类染色体脆弱位点数据库。

BMC Genomics. 2019 Apr 18;19(Suppl 9):985. doi: 10.1186/s12864-018-5330-5.

Punctuated evolution of prostate cancer genomes.前列腺癌基因组的间断进化。

Cell. 2013 Apr 25;153(3):666-77. doi: 10.1016/j.cell.2013.03.021.

NetMODE: network motif detection without Nauty.NetMODE：无需 Nauty 的网络基元检测。

PLoS One. 2012;7(12):e50093. doi: 10.1371/journal.pone.0050093. Epub 2012 Dec 18.

Biological network motif detection: principles and practice.生物网络基元检测：原理与实践。

Brief Bioinform. 2012 Mar;13(2):202-15. doi: 10.1093/bib/bbr033. Epub 2011 Jun 20.

Massive genomic rearrangement acquired in a single catastrophic event during cancer development.在癌症发展过程中，单一灾难性事件获得的大规模基因组重排。

Cell. 2011 Jan 7;144(1):27-40. doi: 10.1016/j.cell.2010.11.055.

MODA: an efficient algorithm for network motif discovery in biological networks.MODA：一种用于生物网络中网络基序发现的高效算法。

Genes Genet Syst. 2009 Oct;84(5):385-95. doi: 10.1266/ggs.84.385.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

聚类和图挖掘技术在癌症基因组复杂结构变异分类中的应用。

Clustering and graph mining techniques for classification of complex structural variations in cancer genomes.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献