Suppr超能文献

利用可解释的深度表示学习破译 3'UTR 介导的基因调控。

Deciphering 3'UTR Mediated Gene Regulation Using Interpretable Deep Representation Learning.

机构信息

School of Information Science and Technology, Northeast Normal University, Changchun, Jilin, 130117, China.

Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, ON, M5S 3E1, Canada.

出版信息

Adv Sci (Weinh). 2024 Oct;11(39):e2407013. doi: 10.1002/advs.202407013. Epub 2024 Aug 19.

Abstract

The 3' untranslated regions (3'UTRs) of messenger RNAs contain many important cis-regulatory elements that are under functional and evolutionary constraints. It is hypothesized that these constraints are similar to grammars and syntaxes in human languages and can be modeled by advanced natural language techniques such as Transformers, which has been very effective in modeling complex protein sequence and structures. Here 3UTRBERT is described, which implements an attention-based language model, i.e., Bidirectional Encoder Representations from Transformers (BERT). 3UTRBERT is pre-trained on aggregated 3'UTR sequences of human mRNAs in a task-agnostic manner; the pre-trained model is then fine-tuned for specific downstream tasks such as identifying RBP binding sites, m6A RNA modification sites, and predicting RNA sub-cellular localizations. Benchmark results show that 3UTRBERT generally outperformed other contemporary methods in each of these tasks. More importantly, the self-attention mechanism within 3UTRBERT allows direct visualization of the semantic relationship between sequence elements and effectively identifies regions with important regulatory potential. It is expected that 3UTRBERT model can serve as the foundational tool to analyze various sequence labeling tasks within the 3'UTR fields, thus enhancing the decipherability of post-transcriptional regulatory mechanisms.

摘要

信使 RNA 的 3'非翻译区 (3'UTRs) 包含许多重要的顺式调控元件,这些元件受到功能和进化的约束。据推测,这些约束类似于人类语言中的语法和句法,可以通过先进的自然语言技术(如 Transformer)进行建模,Transformer 在建模复杂的蛋白质序列和结构方面非常有效。这里描述了 3UTRBERT,它实现了基于注意力的语言模型,即来自 Transformer 的双向编码器表示 (BERT)。3UTRBERT 以无任务的方式在人类 mRNA 的聚合 3'UTR 序列上进行预训练;然后,针对特定的下游任务(如识别 RBPs 结合位点、m6A RNA 修饰位点和预测 RNA 亚细胞定位)对预训练模型进行微调。基准测试结果表明,3UTRBERT 在这些任务中的每一个任务中都普遍优于其他当代方法。更重要的是,3UTRBERT 中的自注意力机制允许直接可视化序列元素之间的语义关系,并有效地识别具有重要调控潜力的区域。预计 3UTRBERT 模型可以作为分析 3'UTR 领域内各种序列标记任务的基础工具,从而增强转录后调控机制的可解释性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4846/11497048/d2f5b5032eb7/ADVS-11-2407013-g005.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验