Suppr超能文献

利用数百万酵母启动子预测基因表达揭示了调控逻辑。

Predicting gene expression using millions of yeast promoters reveals -regulatory logic.

作者信息

Dash Tirtharaj, Bornelöv Susanne

机构信息

Cancer Research UK Cambridge Institute, University of Cambridge, Li Ka Shing Centre, Cambridge CB2 0RE, United Kingdom.

Department of Biochemistry, University of Cambridge, Cambridge CB2 1GA, United Kingdom.

出版信息

Bioinform Adv. 2025 Jun 2;5(1):vbaf130. doi: 10.1093/bioadv/vbaf130. eCollection 2025.

Abstract

MOTIVATION

Gene regulation involves complex interactions between transcription factors. While early attempts to predict gene expression were trained using naturally occurring promoters, gigantic parallel reporter assays have vastly expanded potential training data. Despite this, it is still unclear how to best use deep learning to study gene regulation. Here, we investigate the association between promoters and expression using Camformer, a residual convolutional neural network that ranked fourth in the Random Promoter DREAM Challenge 2022. We present the original model trained on 6.7 million sequences and investigate 270 alternative models to find determinants of model performance. Finally, we use explainable AI to uncover regulatory signals.

RESULTS

Camformer accurately decodes the association between promoters and gene expression ( , ) and provides a substantial improvement over previous state of the art. Using Grad-CAM and in silico mutagenesis, we demonstrate that our model learns both individual motifs and their hierarchy. For example, while an IME1 motif on its own increases gene expression, a co-occurring UME6 motif instead strongly reduces gene expression. Thus, deep learning models such as Camformer can provide detailed insights into -regulatory logic.

AVAILABILITY AND IMPLEMENTATION

Data and code are available at: https://github.com/Bornelov-lab/Camformer.

摘要

动机

基因调控涉及转录因子之间的复杂相互作用。早期预测基因表达的尝试是使用天然存在的启动子进行训练的,而大规模平行报告基因检测极大地扩展了潜在的训练数据。尽管如此,如何最好地利用深度学习来研究基因调控仍不清楚。在这里,我们使用Camformer(一种在2022年随机启动子DREAM挑战赛中排名第四的残差卷积神经网络)研究启动子与基因表达之间的关联。我们展示了在670万个序列上训练的原始模型,并研究了270个替代模型以找出模型性能的决定因素。最后,我们使用可解释人工智能来揭示调控信号。

结果

Camformer准确地解码了启动子与基因表达之间的关联( , ),并比之前的技术水平有了显著改进。使用Grad-CAM和计算机诱变,我们证明我们的模型学习了单个基序及其层次结构。例如,虽然单独的IME1基序会增加基因表达,但同时出现的UME6基序却会强烈降低基因表达。因此,像Camformer这样的深度学习模型可以为基因调控逻辑提供详细的见解。

可用性和实现方式

数据和代码可在以下网址获取:https://github.com/Bornelov-lab/Camformer

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/75a6/12188188/a29addbeb121/vbaf130f1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验