Suppr超能文献

DeepG4:一种深度学习方法,用于预测细胞类型特异性的活性 G-四链体区域。

DeepG4: A deep learning approach to predict cell-type specific active G-quadruplex regions.

机构信息

Molecular, Cellular and Developmental biology department (MCD), Centre de Biologie Intégrative (CBI), University of Toulouse, CNRS, UPS, Toulouse, France.

Centre de Recherches en Cancérologie de Toulouse (CRCT), INSERM U1037, Toulouse, France.

出版信息

PLoS Comput Biol. 2021 Aug 12;17(8):e1009308. doi: 10.1371/journal.pcbi.1009308. eCollection 2021 Aug.

Abstract

DNA is a complex molecule carrying the instructions an organism needs to develop, live and reproduce. In 1953, Watson and Crick discovered that DNA is composed of two chains forming a double-helix. Later on, other structures of DNA were discovered and shown to play important roles in the cell, in particular G-quadruplex (G4). Following genome sequencing, several bioinformatic algorithms were developed to map G4s in vitro based on a canonical sequence motif, G-richness and G-skewness or alternatively sequence features including k-mers, and more recently machine/deep learning. Recently, new sequencing techniques were developed to map G4s in vitro (G4-seq) and G4s in vivo (G4 ChIP-seq) at few hundred base resolution. Here, we propose a novel convolutional neural network (DeepG4) to map cell-type specific active G4 regions (e.g. regions within which G4s form both in vitro and in vivo). DeepG4 is very accurate to predict active G4 regions in different cell types. Moreover, DeepG4 identifies key DNA motifs that are predictive of G4 region activity. We found that such motifs do not follow a very flexible sequence pattern as current algorithms seek for. Instead, active G4 regions are determined by numerous specific motifs. Moreover, among those motifs, we identified known transcription factors (TFs) which could play important roles in G4 activity by contributing either directly to G4 structures themselves or indirectly by participating in G4 formation in the vicinity. In addition, we used DeepG4 to predict active G4 regions in a large number of tissues and cancers, thereby providing a comprehensive resource for researchers. Availability: https://github.com/morphos30/DeepG4.

摘要

DNA 是一种携带生物体发育、生存和繁殖所需指令的复杂分子。1953 年,沃森和克里克发现 DNA 由两条链组成,形成双螺旋结构。后来,又发现了其他结构的 DNA,并证明它们在细胞中发挥着重要作用,特别是 G-四链体 (G4)。在基因组测序之后,开发了几种生物信息学算法,根据典型序列基序、G 丰富度和 G 偏度或替代序列特征(包括 k- mers 以及最近的机器学习/深度学习)在体外绘制 G4。最近,开发了新的测序技术来体外(G4-seq)和体内(G4-ChIP-seq)以数百个碱基的分辨率绘制 G4。在这里,我们提出了一种新的卷积神经网络(DeepG4)来绘制细胞类型特异性的活性 G4 区域(例如,体外和体内都形成 G4 的区域)。DeepG4 非常准确地预测不同细胞类型中的活性 G4 区域。此外,DeepG4 确定了预测 G4 区域活性的关键 DNA 基序。我们发现,这些基序不遵循当前算法所寻求的非常灵活的序列模式。相反,活性 G4 区域由许多特定的基序决定。此外,在这些基序中,我们确定了已知的转录因子(TFs),它们可以通过直接参与 G4 结构本身或通过参与附近的 G4 形成间接参与 G4 活性,从而发挥重要作用。此外,我们使用 DeepG4 预测了大量组织和癌症中的活性 G4 区域,从而为研究人员提供了一个全面的资源。可用性:https://github.com/morphos30/DeepG4。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/48b2/8384162/00b967913638/pcbi.1009308.g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验