• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

利用转换器从 DNA 序列和转录后信息预测基因表达水平。

Predicting gene expression levels from DNA sequences and post-transcriptional information with transformers.

机构信息

Enzo Ferrari Engineering Department, University of Modena and Reggio Emilia, Via P. Vivarelli, 10, Modena, Emilia Romagna 41125, Italy.

Department of Control and Computer Engineering, Corso Duca degli Abruzzi, 24, Turin, Piedmont 10129 Italy.

出版信息

Comput Methods Programs Biomed. 2022 Oct;225:107035. doi: 10.1016/j.cmpb.2022.107035. Epub 2022 Aug 7.

DOI:10.1016/j.cmpb.2022.107035
PMID:35970054
Abstract

BACKGROUND AND OBJECTIVES

In the latest years, the prediction of gene expression levels has been crucial due to its potential applications in the clinics. In this context, Xpresso and others methods based on Convolutional Neural Networks and Transformers were firstly proposed to this aim. However, all these methods embed data with a standard one-hot encoding algorithm, resulting in impressively sparse matrices. In addition, post-transcriptional regulation processes, which are of uttermost importance in the gene expression process, are not considered in the model.

METHODS

This paper presents Transformer DeepLncLoc, a novel method to predict the abundance of the mRNA (i.e., gene expression levels) by processing gene promoter sequences, managing the problem as a regression task. The model exploits a transformer-based architecture, introducing the DeepLncLoc method to perform the data embedding. Since DeepLncloc is based on word2vec algorithm, it avoids the sparse matrices problem.

RESULTS

Post-transcriptional information related to mRNA stability and transcription factors is included in the model, leading to significantly improved performances compared to the state-of-the-art works. Transformer DeepLncLoc reached 0.76 of R evaluation metric compared to 0.74 of Xpresso.

CONCLUSION

The Multi-Headed Attention mechanisms which characterizes the transformer methodology is suitable for modeling the interactions between DNA's locations, overcoming the recurrent models. Finally, the integration of the transcription factors data in the pipeline leads to impressive gains in predictive power.

摘要

背景与目的

近年来,由于其在临床中的潜在应用,基因表达水平的预测变得至关重要。在此背景下,Xpresso 及其他基于卷积神经网络和转换器的方法首次被提出以实现这一目标。然而,所有这些方法都使用标准的独热编码算法对数据进行嵌入,导致矩阵非常稀疏。此外,模型中未考虑在后转录调控过程,这在基因表达过程中至关重要。

方法

本文提出了 Transformer DeepLncLoc,这是一种通过处理基因启动子序列来预测 mRNA 丰度(即基因表达水平)的新方法,将该问题视为回归任务。该模型利用基于转换器的架构,引入 DeepLncLoc 方法进行数据嵌入。由于 DeepLncLoc 基于 word2vec 算法,因此避免了矩阵稀疏的问题。

结果

模型中包含与 mRNA 稳定性和转录因子相关的后转录信息,与最先进的方法相比,性能得到了显著提高。Transformer DeepLncLoc 的 R 评估指标达到 0.76,而 Xpresso 为 0.74。

结论

Transformer 方法的多头注意力机制适合于建模 DNA 位置之间的相互作用,克服了递归模型的局限性。最后,将转录因子数据集成到管道中可以显著提高预测能力。

相似文献

1
Predicting gene expression levels from DNA sequences and post-transcriptional information with transformers.利用转换器从 DNA 序列和转录后信息预测基因表达水平。
Comput Methods Programs Biomed. 2022 Oct;225:107035. doi: 10.1016/j.cmpb.2022.107035. Epub 2022 Aug 7.
2
Predicting gene and protein expression levels from DNA and protein sequences with Perceiver.利用 Perceiver 从 DNA 和蛋白质序列预测基因和蛋白质表达水平。
Comput Methods Programs Biomed. 2023 Jun;234:107504. doi: 10.1016/j.cmpb.2023.107504. Epub 2023 Mar 22.
3
DeepLncLoc: a deep learning framework for long non-coding RNA subcellular localization prediction based on subsequence embedding.DeepLncLoc:一种基于子序列嵌入的深度学习框架,用于长非编码 RNA 亚细胞定位预测。
Brief Bioinform. 2022 Jan 17;23(1). doi: 10.1093/bib/bbab360.
4
MiREx: mRNA levels prediction from gene sequence and miRNA target knowledge.MiREx:基于基因序列和 miRNA 靶知识的 mRNA 水平预测。
BMC Bioinformatics. 2023 Nov 22;24(1):443. doi: 10.1186/s12859-023-05560-1.
5
A transformer architecture based on BERT and 2D convolutional neural network to identify DNA enhancers from sequence information.基于 BERT 和二维卷积神经网络的变压器架构,用于从序列信息中识别 DNA 增强子。
Brief Bioinform. 2021 Sep 2;22(5). doi: 10.1093/bib/bbab005.
6
Vision Transformer-based recognition of diabetic retinopathy grade.基于 Vision Transformer 的糖尿病视网膜病变分级识别。
Med Phys. 2021 Dec;48(12):7850-7863. doi: 10.1002/mp.15312. Epub 2021 Nov 16.
7
Transformers-sklearn: a toolkit for medical language understanding with transformer-based models.Transformer-sklearn:一个基于 Transformer 的模型的医学语言理解工具包。
BMC Med Inform Decis Mak. 2021 Jul 30;21(Suppl 2):90. doi: 10.1186/s12911-021-01459-0.
8
Predicting mRNA Abundance Directly from Genomic Sequence Using Deep Convolutional Neural Networks.利用深度卷积神经网络直接从基因组序列预测 mRNA 丰度。
Cell Rep. 2020 May 19;31(7):107663. doi: 10.1016/j.celrep.2020.107663.
9
MCWS-Transformers: Towards an Efficient Modeling of Protein Sequences via Multi Context-Window Based Scaled Self-Attention.MCWS-Transformer:通过基于多上下文窗口的缩放自注意力实现蛋白质序列的高效建模
IEEE/ACM Trans Comput Biol Bioinform. 2023 Mar-Apr;20(2):1188-1199. doi: 10.1109/TCBB.2022.3173789. Epub 2023 Apr 3.
10
RT-ViT: Real-Time Monocular Depth Estimation Using Lightweight Vision Transformers.RT-ViT:基于轻量级视觉Transformer 的实时单目深度估计。
Sensors (Basel). 2022 May 19;22(10):3849. doi: 10.3390/s22103849.

引用本文的文献

1
Combining diffusion and transformer models for enhanced promoter synthesis and strength prediction in deep learning.结合扩散模型和变压器模型以增强深度学习中启动子的合成及强度预测
mSystems. 2025 Apr 22;10(4):e0018325. doi: 10.1128/msystems.00183-25. Epub 2025 Mar 19.
2
Deciphering genomic codes using advanced natural language processing techniques: a scoping review.使用先进自然语言处理技术解读基因组编码:一项范围综述
J Am Med Inform Assoc. 2025 Apr 1;32(4):761-772. doi: 10.1093/jamia/ocaf029.
3
TExCNN: Leveraging Pre-Trained Models to Predict Gene Expression from Genomic Sequences.
TExCNN:利用预训练模型从基因组序列预测基因表达
Genes (Basel). 2024 Dec 12;15(12):1593. doi: 10.3390/genes15121593.
4
Deciphering genomic codes using advanced NLP techniques: a scoping review.使用先进的自然语言处理技术解读基因组编码:一项范围综述
ArXiv. 2024 Nov 25:arXiv:2411.16084v1.
5
MiREx: mRNA levels prediction from gene sequence and miRNA target knowledge.MiREx:基于基因序列和 miRNA 靶知识的 mRNA 水平预测。
BMC Bioinformatics. 2023 Nov 22;24(1):443. doi: 10.1186/s12859-023-05560-1.