一项旨在优化基于序列的基因调控深度学习模型的社区努力。

A community effort to optimize sequence-based deep learning models of gene regulation.

作者信息

Rafi Abdul Muntakim, Nogina Daria, Penzar Dmitry, Lee Dohoon, Lee Danyeong, Kim Nayeon, Kim Sangyeup, Kim Dohyeon, Shin Yeojin, Kwak Il-Youp, Meshcheryakov Georgy, Lando Andrey, Zinkevich Arsenii, Kim Byeong-Chan, Lee Juhyun, Kang Taein, Vaishnav Eeshit Dhaval, Yadollahpour Payman, Kim Sun, Albrecht Jake, Regev Aviv, Gong Wuming, Kulakovskiy Ivan V, Meyer Pablo, de Boer Carl G

机构信息

University of British Columbia, Vancouver, British Columbia, Canada.

Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State University, Moscow, Russia.

出版信息

Nat Biotechnol. 2024 Oct 11. doi: 10.1038/s41587-024-02414-w.

DOI:10.1038/s41587-024-02414-w

PMID:39394483

Abstract

A systematic evaluation of how model architectures and training strategies impact genomics model performance is needed. To address this gap, we held a DREAM Challenge where competitors trained models on a dataset of millions of random promoter DNA sequences and corresponding expression levels, experimentally determined in yeast. For a robust evaluation of the models, we designed a comprehensive suite of benchmarks encompassing various sequence types. All top-performing models used neural networks but diverged in architectures and training strategies. To dissect how architectural and training choices impact performance, we developed the Prix Fixe framework to divide models into modular building blocks. We tested all possible combinations for the top three models, further improving their performance. The DREAM Challenge models not only achieved state-of-the-art results on our comprehensive yeast dataset but also consistently surpassed existing benchmarks on Drosophila and human genomic datasets, demonstrating the progress that can be driven by gold-standard genomics datasets.

摘要

需要对模型架构和训练策略如何影响基因组学模型性能进行系统评估。为了填补这一空白，我们举办了一场DREAM挑战赛，让参赛者在数百万个随机启动子DNA序列及其在酵母中实验测定的相应表达水平的数据集上训练模型。为了对模型进行稳健评估，我们设计了一套全面的基准测试，涵盖各种序列类型。所有表现最佳的模型都使用了神经网络，但在架构和训练策略上有所不同。为了剖析架构和训练选择如何影响性能，我们开发了“固定价格”框架，将模型划分为模块化构建块。我们测试了排名前三的模型的所有可能组合，进一步提高了它们的性能。DREAM挑战赛模型不仅在我们全面的酵母数据集上取得了领先成果，而且在果蝇和人类基因组数据集上也始终超越了现有基准，证明了金标准基因组数据集能够推动的进展。

相似文献

A community effort to optimize sequence-based deep learning models of gene regulation.一项旨在优化基于序列的基因调控深度学习模型的社区努力。

Nat Biotechnol. 2024 Oct 11. doi: 10.1038/s41587-024-02414-w.

Evaluation and optimization of sequence-based gene regulatory deep learning models.基于序列的基因调控深度学习模型的评估与优化

bioRxiv. 2024 Feb 17:2023.04.26.538471. doi: 10.1101/2023.04.26.538471.

Prescription of Controlled Substances: Benefits and Risks管制药品的处方：益处与风险

Quality improvement strategies for diabetes care: Effects on outcomes for adults living with diabetes.糖尿病护理质量改进策略：对成年糖尿病患者结局的影响。

Cochrane Database Syst Rev. 2023 May 31;5(5):CD014513. doi: 10.1002/14651858.CD014513.

Short-Term Memory Impairment短期记忆障碍

Deep Genomics: Deep Learning-Based Analysis of Genome-Sequenced Data for Identification of Gene Alterations.深度基因组学：基于深度学习的基因组测序数据分析以识别基因改变

Methods Mol Biol. 2025;2952:335-367. doi: 10.1007/978-1-0716-4690-8_20.

A medical image classification method based on self-regularized adversarial learning.基于自正则化对抗学习的医学图像分类方法。

Med Phys. 2024 Nov;51(11):8232-8246. doi: 10.1002/mp.17320. Epub 2024 Jul 30.

"In a State of Flow": A Qualitative Examination of Autistic Adults' Phenomenological Experiences of Task Immersion.“心流状态”：对自闭症成年人任务沉浸现象学体验的质性研究

Autism Adulthood. 2024 Sep 16;6(3):362-373. doi: 10.1089/aut.2023.0032. eCollection 2024 Sep.

Learning hemodynamic scalar fields on coronary artery meshes: A benchmark of geometric deep learning models.在冠状动脉网格上学习血流动力学标量场：几何深度学习模型的基准测试

Comput Biol Med. 2025 Jun 17;195:110477. doi: 10.1016/j.compbiomed.2025.110477.

The Lived Experience of Autistic Adults in Employment: A Systematic Search and Synthesis.成年自闭症患者的就业生活经历：系统检索与综述

Autism Adulthood. 2024 Dec 2;6(4):495-509. doi: 10.1089/aut.2022.0114. eCollection 2024 Dec.

引用本文的文献

In silico prediction of variant effects: promises and limitations for precision plant breeding.变异效应的计算机模拟预测：精准植物育种的前景与局限

Theor Appl Genet. 2025 Jul 28;138(8):193. doi: 10.1007/s00122-025-04973-1.

GAME: Genomic API for Model Evaluation.GAME：用于模型评估的基因组应用程序编程接口。

bioRxiv. 2025 Jul 8:2025.07.04.663250. doi: 10.1101/2025.07.04.663250.

Predicting gene expression using millions of yeast promoters reveals -regulatory logic.利用数百万酵母启动子预测基因表达揭示了调控逻辑。

Bioinform Adv. 2025 Jun 2;5(1):vbaf130. doi: 10.1093/bioadv/vbaf130. eCollection 2025.

Unraveling the regulatory dynamics of bidirectional promoters for modulating gene co-expression and metabolic flux in Saccharomyces cerevisiae.解析酿酒酵母中双向启动子调控基因共表达和代谢通量的动态机制

Nucleic Acids Res. 2025 Jun 6;53(11). doi: 10.1093/nar/gkaf511.

Predicting gene expression from DNA sequence using deep learning models.使用深度学习模型从DNA序列预测基因表达。

Nat Rev Genet. 2025 May 13. doi: 10.1038/s41576-025-00841-2.

ChromBPNet: bias factorized, base-resolution deep learning models of chromatin accessibility reveal cis-regulatory sequence syntax, transcription factor footprints and regulatory variants.ChromBPNet：染色质可及性的偏差分解、碱基分辨率深度学习模型揭示顺式调控序列语法、转录因子足迹和调控变异体

bioRxiv. 2025 Jan 8:2024.12.25.630221. doi: 10.1101/2024.12.25.630221.

A generative framework for enhanced cell-type specificity in rationally designed mRNAs.一种用于在合理设计的mRNA中增强细胞类型特异性的生成框架。

bioRxiv. 2024 Dec 31:2024.12.31.630783. doi: 10.1101/2024.12.31.630783.

Predictive Modeling of Gene Expression and Localization of DNA Binding Site Using Deep Convolutional Neural Networks.使用深度卷积神经网络对基因表达进行预测建模及DNA结合位点定位

bioRxiv. 2024 Dec 20:2024.12.17.629042. doi: 10.1101/2024.12.17.629042.

本文引用的文献

Predicting RNA-seq coverage from DNA sequence as a unifying model of gene regulation.将DNA序列预测RNA测序覆盖度作为基因调控的统一模型。

Nat Genet. 2025 Apr;57(4):949-961. doi: 10.1038/s41588-024-02053-6. Epub 2025 Jan 8.

Proformer: a hybrid macaron transformer model predicts expression values from promoter sequences.Proformer：一种混合的马卡龙转换器模型，可根据启动子序列预测表达值。

BMC Bioinformatics. 2024 Feb 20;25(1):81. doi: 10.1186/s12859-024-05645-5.

Affinity-optimizing enhancer variants disrupt development.优化亲和力的增强子变异会破坏发育。

Nature. 2024 Feb;626(7997):151-159. doi: 10.1038/s41586-023-06922-8. Epub 2024 Jan 17.

Hold out the genome: a roadmap to solving the cis-regulatory code.伸出基因组：解决顺式调控代码的路线图。

Nature. 2024 Jan;625(7993):41-50. doi: 10.1038/s41586-023-06661-w. Epub 2023 Dec 13.

Chromatin accessibility in the Drosophila embryo is determined by transcription factor pioneering and enhancer activation.果蝇胚胎中的染色质可及性由转录因子的开拓和增强子的激活决定。

Dev Cell. 2023 Oct 9;58(19):1898-1916.e9. doi: 10.1016/j.devcel.2023.07.007. Epub 2023 Aug 8.

LegNet: a best-in-class deep learning model for short DNA regulatory regions.LegNet：用于短 DNA 调控区域的一流深度学习模型。

Bioinformatics. 2023 Aug 1;39(8). doi: 10.1093/bioinformatics/btad457.

Current sequence-based models capture gene expression determinants in promoters but mostly ignore distal enhancers.目前基于序列的模型可以捕捉启动子中的基因表达决定因素，但大多忽略了远端增强子。

Genome Biol. 2023 Mar 27;24(1):56. doi: 10.1186/s13059-023-02899-9.

DeepSTARR predicts enhancer activity from DNA sequence and enables the de novo design of synthetic enhancers.DeepSTARR 可根据 DNA 序列预测增强子活性，并能够从头设计合成增强子。

Nat Genet. 2022 May;54(5):613-624. doi: 10.1038/s41588-022-01048-5. Epub 2022 May 12.

Inferring mammalian tissue-specific regulatory conservation by predicting tissue-specific differences in open chromatin.通过预测开放染色质中的组织特异性差异来推断哺乳动物组织特异性调控保守性。

BMC Genomics. 2022 Apr 11;23(1):291. doi: 10.1186/s12864-022-08450-7.

The evolution, evolvability and engineering of gene regulatory DNA.基因调控 DNA 的进化、可进化性与工程。

Nature. 2022 Mar;603(7901):455-463. doi: 10.1038/s41586-022-04506-6. Epub 2022 Mar 9.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

一项旨在优化基于序列的基因调控深度学习模型的社区努力。

A community effort to optimize sequence-based deep learning models of gene regulation.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献