利用可解释的深度表示学习破译 3'UTR 介导的基因调控。

Deciphering 3'UTR Mediated Gene Regulation Using Interpretable Deep Representation Learning.

机构信息

School of Information Science and Technology, Northeast Normal University, Changchun, Jilin, 130117, China.

Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, ON, M5S 3E1, Canada.

出版信息

Adv Sci (Weinh). 2024 Oct;11(39):e2407013. doi: 10.1002/advs.202407013. Epub 2024 Aug 19.

DOI:10.1002/advs.202407013

PMID:39159140

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11497048/

Abstract

The 3' untranslated regions (3'UTRs) of messenger RNAs contain many important cis-regulatory elements that are under functional and evolutionary constraints. It is hypothesized that these constraints are similar to grammars and syntaxes in human languages and can be modeled by advanced natural language techniques such as Transformers, which has been very effective in modeling complex protein sequence and structures. Here 3UTRBERT is described, which implements an attention-based language model, i.e., Bidirectional Encoder Representations from Transformers (BERT). 3UTRBERT is pre-trained on aggregated 3'UTR sequences of human mRNAs in a task-agnostic manner; the pre-trained model is then fine-tuned for specific downstream tasks such as identifying RBP binding sites, m6A RNA modification sites, and predicting RNA sub-cellular localizations. Benchmark results show that 3UTRBERT generally outperformed other contemporary methods in each of these tasks. More importantly, the self-attention mechanism within 3UTRBERT allows direct visualization of the semantic relationship between sequence elements and effectively identifies regions with important regulatory potential. It is expected that 3UTRBERT model can serve as the foundational tool to analyze various sequence labeling tasks within the 3'UTR fields, thus enhancing the decipherability of post-transcriptional regulatory mechanisms.

摘要

信使 RNA 的 3'非翻译区 (3'UTRs) 包含许多重要的顺式调控元件，这些元件受到功能和进化的约束。据推测，这些约束类似于人类语言中的语法和句法，可以通过先进的自然语言技术（如 Transformer）进行建模，Transformer 在建模复杂的蛋白质序列和结构方面非常有效。这里描述了 3UTRBERT，它实现了基于注意力的语言模型，即来自 Transformer 的双向编码器表示 (BERT)。3UTRBERT 以无任务的方式在人类 mRNA 的聚合 3'UTR 序列上进行预训练；然后，针对特定的下游任务（如识别 RBPs 结合位点、m6A RNA 修饰位点和预测 RNA 亚细胞定位）对预训练模型进行微调。基准测试结果表明，3UTRBERT 在这些任务中的每一个任务中都普遍优于其他当代方法。更重要的是，3UTRBERT 中的自注意力机制允许直接可视化序列元素之间的语义关系，并有效地识别具有重要调控潜力的区域。预计 3UTRBERT 模型可以作为分析 3'UTR 领域内各种序列标记任务的基础工具，从而增强转录后调控机制的可解释性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4846/11497048/d2f5b5032eb7/ADVS-11-2407013-g005.jpg

相似文献

Deciphering 3'UTR Mediated Gene Regulation Using Interpretable Deep Representation Learning.利用可解释的深度表示学习破译 3'UTR 介导的基因调控。

Adv Sci (Weinh). 2024 Oct;11(39):e2407013. doi: 10.1002/advs.202407013. Epub 2024 Aug 19.

Reprogramming of 3' untranslated regions of mRNAs by alternative polyadenylation in generation of pluripotent stem cells from different cell types.多能干细胞的产生中通过可变多聚腺苷酸化重编程 mRNAs 的 3' 非翻译区。

PLoS One. 2009 Dec 23;4(12):e8419. doi: 10.1371/journal.pone.0008419.

3'UTRs take a long shot in the brain.3'UTR 在大脑中进行长距离射击。

Bioessays. 2014 Jan;36(1):39-45. doi: 10.1002/bies.201300100. Epub 2013 Sep 20.

Differential evolution in 3'UTRs leads to specific gene expression in Staphylococcus.3'UTR 的差异进化导致金黄色葡萄球菌中特定基因的表达。

Nucleic Acids Res. 2020 Mar 18;48(5):2544-2563. doi: 10.1093/nar/gkaa047.

Evolution of Hox post-transcriptional regulation by alternative polyadenylation and microRNA modulation within 12 Drosophila genomes.12 个果蝇基因组中通过可变多聚腺苷酸化和 microRNA 调控的 Hox 转录后调控的演化。

Mol Biol Evol. 2011 Sep;28(9):2453-60. doi: 10.1093/molbev/msr073. Epub 2011 Mar 24.

A HuD-ZBP1 ribonucleoprotein complex localizes GAP-43 mRNA into axons through its 3' untranslated region AU-rich regulatory element.HuD-ZBP1 核糖核蛋白复合物通过其 3' 非翻译区富含 AU 的调控元件将 GAP-43 mRNA 定位到轴突中。

J Neurochem. 2013 Sep;126(6):792-804. doi: 10.1111/jnc.12266. Epub 2013 Apr 30.

Generation of 3'UTR knockout cell lines by CRISPR/Cas9-mediated genome editing.通过 CRISPR/Cas9 介导的基因组编辑生成 3'UTR 敲除细胞系。

Methods Enzymol. 2021;655:427-457. doi: 10.1016/bs.mie.2021.03.014. Epub 2021 May 28.

Translational regulation of GluR2 mRNAs in rat hippocampus by alternative 3' untranslated regions.大鼠海马中通过可变3'非翻译区对GluR2 mRNA进行的翻译调控。

J Neurochem. 2009 Apr;109(2):584-94. doi: 10.1111/j.1471-4159.2009.05992.x. Epub 2009 Feb 13.

Formation, regulation and evolution of Caenorhabditis elegans 3'UTRs.秀丽隐杆线虫 3'UTR 的形成、调控和进化。

Nature. 2011 Jan 6;469(7328):97-101. doi: 10.1038/nature09616. Epub 2010 Nov 17.

Regulation of AGR2 expression via 3'UTR shortening.通过3'非翻译区缩短调控AGR2表达。

Exp Cell Res. 2017 Jul 1;356(1):40-47. doi: 10.1016/j.yexcr.2017.04.011. Epub 2017 Apr 11.

引用本文的文献

mRNABench: A curated benchmark for mature mRNA property and function prediction.mRNABench：用于成熟mRNA特性和功能预测的精选基准。

bioRxiv. 2025 Jul 8:2025.07.05.662870. doi: 10.1101/2025.07.05.662870.

Decoding the interactions and functions of non-coding RNA with artificial intelligence.利用人工智能解码非编码RNA的相互作用和功能。

Nat Rev Mol Cell Biol. 2025 Jun 19. doi: 10.1038/s41580-025-00857-w.

mRNA-LM: full-length integrated SLM for mRNA analysis.mRNA-LM：用于mRNA分析的全长整合型单分子定位显微镜

Nucleic Acids Res. 2025 Jan 24;53(3). doi: 10.1093/nar/gkaf044.

EnrichRBP: an automated and interpretable computational platform for predicting and analysing RNA-binding protein events.EnrichRBP：一个用于预测和分析RNA结合蛋白事件的自动化且可解释的计算平台。

Bioinformatics. 2024 Dec 26;41(1). doi: 10.1093/bioinformatics/btaf018.

A generative framework for enhanced cell-type specificity in rationally designed mRNAs.一种用于在合理设计的mRNA中增强细胞类型特异性的生成框架。

bioRxiv. 2024 Dec 31:2024.12.31.630783. doi: 10.1101/2024.12.31.630783.

Advancing bioinformatics with large language models: components, applications and perspectives.利用大语言模型推进生物信息学：组件、应用与展望

ArXiv. 2025 Jan 31:arXiv:2401.04155v2.

本文引用的文献

DeepLocRNA: an interpretable deep learning model for predicting RNA subcellular localization with domain-specific transfer-learning.DeepLocRNA：一种具有领域特定迁移学习功能的可解释深度学习模型，用于预测 RNA 亚细胞定位。

Bioinformatics. 2024 Feb 1;40(2). doi: 10.1093/bioinformatics/btae065.

Multiple sequence alignment-based RNA language model and its application to structural inference.基于多重序列比对的 RNA 语言模型及其在结构推断中的应用。

Nucleic Acids Res. 2024 Jan 11;52(1):e3. doi: 10.1093/nar/gkad1031.

Dynamic characterization and interpretation for protein-RNA interactions across diverse cellular conditions using HDRNet.使用 HDRNet 对不同细胞条件下的蛋白质-RNA 相互作用进行动态特征描述和解释。

Nat Commun. 2023 Oct 26;14(1):6824. doi: 10.1038/s41467-023-42547-1.

Rm-LR: A long-range-based deep learning model for predicting multiple types of RNA modifications.Rm-LR：一种用于预测多种类型RNA修饰的基于长程的深度学习模型。

Comput Biol Med. 2023 Sep;164:107238. doi: 10.1016/j.compbiomed.2023.107238. Epub 2023 Jul 8.

Evolutionary-scale prediction of atomic-level protein structure with a language model.用语言模型进行原子级蛋白质结构的进化尺度预测。

Science. 2023 Mar 17;379(6637):1123-1130. doi: 10.1126/science.ade2574. Epub 2023 Mar 16.

Prediction of RNA-protein interactions using a nucleotide language model.使用核苷酸语言模型预测RNA-蛋白质相互作用。

Bioinform Adv. 2022 Apr 7;2(1):vbac023. doi: 10.1093/bioadv/vbac023. eCollection 2022.

Learning the protein language of proteome-wide protein-protein binding sites via explainable ensemble deep learning.通过可解释的集成深度学习学习蛋白质组范围内蛋白质-蛋白质结合位点的蛋白质语言。

Commun Biol. 2023 Jan 19;6(1):73. doi: 10.1038/s42003-023-04462-5.

Single-sequence protein structure prediction using a language model and deep learning.基于语言模型和深度学习的单序列蛋白质结构预测。

Nat Biotechnol. 2022 Nov;40(11):1617-1623. doi: 10.1038/s41587-022-01432-w. Epub 2022 Oct 3.

RNANetMotif: Identifying sequence-structure RNA network motifs in RNA-protein binding sites.RNANetMotif：在 RNA-蛋白质结合位点中识别序列-结构 RNA 网络基序。

PLoS Comput Biol. 2022 Jul 12;18(7):e1010293. doi: 10.1371/journal.pcbi.1010293. eCollection 2022 Jul.

Integrating convolution and self-attention improves language model of human genome for interpreting non-coding regions at base-resolution.卷积和自注意力的融合提高了人类基因组语言模型，以碱基分辨率解释非编码区域。

Nucleic Acids Res. 2022 Aug 12;50(14):e81. doi: 10.1093/nar/gkac326.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

利用可解释的深度表示学习破译 3'UTR 介导的基因调控。

Deciphering 3'UTR Mediated Gene Regulation Using Interpretable Deep Representation Learning.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献