Suppr超能文献

全面分析癌症断点揭示了遗传和表观遗传对癌症基因组重排的贡献特征。

Comprehensive analysis of cancer breakpoints reveals signatures of genetic and epigenetic contribution to cancer genome rearrangements.

机构信息

Laboratory of Bioinformatics, Faculty of Computer Science, National Research University Higher School of Economics, Moscow, Russia.

Faculty of Digital Transformation, ITMO University, St. Petersburg, Russia.

出版信息

PLoS Comput Biol. 2021 Mar 1;17(3):e1008749. doi: 10.1371/journal.pcbi.1008749. eCollection 2021 Mar.

Abstract

Understanding mechanisms of cancer breakpoint mutagenesis is a difficult task and predictive models of cancer breakpoint formation have to this time failed to achieve even moderate predictive power. Here we take advantage of a machine learning approach that can gather important features from big data and quantify contribution of different factors. We performed comprehensive analysis of almost 630,000 cancer breakpoints and quantified the contribution of genomic and epigenomic features-non-B DNA structures, chromatin organization, transcription factor binding sites and epigenetic markers. The results showed that transcription and formation of non-B DNA structures are two major processes responsible for cancer genome fragility. Epigenetic factors, such as chromatin organization in TADs, open/closed regions, DNA methylation, histone marks are less informative but do make their contribution. As a general trend, individual features inside the groups show a relatively high contribution of G-quadruplexes and repeats and CTCF, GABPA, RXRA, SP1, MAX and NR2F2 transcription factors. Overall, the cancer breakpoint landscape can be represented by well-predicted hotspots and poorly predicted individual breakpoints scattered across genomes. We demonstrated that hotspot mutagenesis has genomic and epigenomic factors, and not all individual cancer breakpoints are just random noise but have a definite mutation signature. Besides we found a long-range action of some features on breakpoint mutagenesis. Combining omics data, cancer-specific individual feature importance and adding the distant to local features, predictive models for cancer breakpoint formation achieved 70-90% ROC AUC for different cancer types; however precision remained low at 2% and the recall did not exceed 50%. On the one hand, the power of models strongly correlates with the size of available cancer breakpoint and epigenomic data, and on the other hand finding strong determinants of cancer breakpoint formation still remains a challenge. The strength of predictive signals of each group and of each feature inside a group can be converted into cancer-specific breakpoint mutation signatures. Overall our results add to the understanding of cancer genome rearrangement processes.

摘要

理解癌症断裂点突变的机制是一项艰巨的任务,到目前为止,预测癌症断裂点形成的模型甚至未能达到中等的预测能力。在这里,我们利用一种机器学习方法,可以从大数据中收集重要特征,并量化不同因素的贡献。我们对近 630,000 个癌症断裂点进行了全面分析,并量化了基因组和表观基因组特征(非 B 型 DNA 结构、染色质组织、转录因子结合位点和表观遗传标记)的贡献。结果表明,转录和非 B 型 DNA 结构的形成是导致癌症基因组脆弱性的两个主要过程。表观遗传因素,如 TAD 中的染色质组织、开放/关闭区域、DNA 甲基化、组蛋白标记的信息较少,但确实有其贡献。一般来说,组内的单个特征显示出相对较高的 G-四联体和重复序列以及 CTCF、GABPA、RXRA、SP1、MAX 和 NR2F2 转录因子的贡献。总的来说,癌症断裂点景观可以由预测良好的热点和散布在基因组中的预测不良的个别断裂点来表示。我们证明了热点突变具有基因组和表观基因组因素,并非所有个体癌症断裂点只是随机噪声,而是具有明确的突变特征。此外,我们发现一些特征对断裂点突变具有长程作用。结合组学数据、癌症特异性个体特征重要性以及添加远距离到局部特征,癌症断裂点形成的预测模型在不同癌症类型中达到了 70-90%的 ROC AUC;然而,精度仍然很低,为 2%,召回率不超过 50%。一方面,模型的强大程度与可用的癌症断裂点和表观基因组数据的大小密切相关,另一方面,找到癌症断裂点形成的强决定因素仍然是一个挑战。每个组和组内每个特征的预测信号的强度可以转化为癌症特异性的断裂点突变特征。总的来说,我们的结果增加了对癌症基因组重排过程的理解。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3634/7951985/bb2a41bc41da/pcbi.1008749.g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验