Suppr超能文献

与基因组调控因子CTCF结合的共有RNA基序的鉴定与分析。

Identification and analysis of consensus RNA motifs binding to the genome regulator CTCF.

作者信息

Kuang Shuzhen, Wang Liangjiang

机构信息

Department of Genetics and Biochemistry, Clemson University, Clemson, SC 29634, USA.

Department of Biological Sciences, Clemson University, Clemson, SC 29634, USA.

出版信息

NAR Genom Bioinform. 2020 May 6;2(2):lqaa031. doi: 10.1093/nargab/lqaa031. eCollection 2020 Jun.

Abstract

CCCTC-binding factor (CTCF) is a key regulator of 3D genome organization and gene expression. Recent studies suggest that RNA transcripts, mostly long non-coding RNAs (lncRNAs), can serve as locus-specific factors to bind and recruit CTCF to the chromatin. However, it remains unclear whether specific sequence patterns are shared by the CTCF-binding RNA sites, and no RNA motif has been reported so far for CTCF binding. In this study, we have developed DeepLncCTCF, a new deep learning model based on a convolutional neural network and a bidirectional long short-term memory network, to discover the RNA recognition patterns of CTCF and identify candidate lncRNAs binding to CTCF. When evaluated on two different datasets, human U2OS dataset and mouse ESC dataset, DeepLncCTCF was shown to be able to accurately predict CTCF-binding RNA sites from nucleotide sequence. By examining the sequence features learned by DeepLncCTCF, we discovered a novel RNA motif with the consensus sequence, AGAUNGGA, for potential CTCF binding in humans. Furthermore, the applicability of DeepLncCTCF was demonstrated by identifying nearly 5000 candidate lncRNAs that might bind to CTCF in the nucleus. Our results provide useful information for understanding the molecular mechanisms of CTCF function in 3D genome organization.

摘要

CCCTC结合因子(CTCF)是三维基因组组织和基因表达的关键调节因子。最近的研究表明,RNA转录本,主要是长链非编码RNA(lncRNA),可以作为位点特异性因子与CTCF结合并将其招募到染色质上。然而,目前尚不清楚CTCF结合RNA位点是否共享特定的序列模式,并且迄今为止尚未报道CTCF结合的RNA基序。在本研究中,我们开发了DeepLncCTCF,这是一种基于卷积神经网络和双向长短期记忆网络的新型深度学习模型,用于发现CTCF的RNA识别模式并识别与CTCF结合的候选lncRNA。在人类U2OS数据集和小鼠胚胎干细胞数据集这两个不同的数据集上进行评估时,DeepLncCTCF能够从核苷酸序列中准确预测CTCF结合RNA位点。通过检查DeepLncCTCF学习到的序列特征,我们发现了一种新的RNA基序,其共有序列为AGAUNGGA,用于人类潜在的CTCF结合。此外,通过鉴定近5000个可能在细胞核中与CTCF结合的候选lncRNA,证明了DeepLncCTCF的适用性。我们的结果为理解CTCF在三维基因组组织中的功能分子机制提供了有用的信息。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2f92/7671415/8edcbe6f07bf/lqaa031fig1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验