Suppr超能文献

一项旨在优化基于序列的基因调控深度学习模型的社区努力。

A community effort to optimize sequence-based deep learning models of gene regulation.

作者信息

Rafi Abdul Muntakim, Nogina Daria, Penzar Dmitry, Lee Dohoon, Lee Danyeong, Kim Nayeon, Kim Sangyeup, Kim Dohyeon, Shin Yeojin, Kwak Il-Youp, Meshcheryakov Georgy, Lando Andrey, Zinkevich Arsenii, Kim Byeong-Chan, Lee Juhyun, Kang Taein, Vaishnav Eeshit Dhaval, Yadollahpour Payman, Kim Sun, Albrecht Jake, Regev Aviv, Gong Wuming, Kulakovskiy Ivan V, Meyer Pablo, de Boer Carl G

机构信息

University of British Columbia, Vancouver, British Columbia, Canada.

Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State University, Moscow, Russia.

出版信息

Nat Biotechnol. 2024 Oct 11. doi: 10.1038/s41587-024-02414-w.

Abstract

A systematic evaluation of how model architectures and training strategies impact genomics model performance is needed. To address this gap, we held a DREAM Challenge where competitors trained models on a dataset of millions of random promoter DNA sequences and corresponding expression levels, experimentally determined in yeast. For a robust evaluation of the models, we designed a comprehensive suite of benchmarks encompassing various sequence types. All top-performing models used neural networks but diverged in architectures and training strategies. To dissect how architectural and training choices impact performance, we developed the Prix Fixe framework to divide models into modular building blocks. We tested all possible combinations for the top three models, further improving their performance. The DREAM Challenge models not only achieved state-of-the-art results on our comprehensive yeast dataset but also consistently surpassed existing benchmarks on Drosophila and human genomic datasets, demonstrating the progress that can be driven by gold-standard genomics datasets.

摘要

需要对模型架构和训练策略如何影响基因组学模型性能进行系统评估。为了填补这一空白,我们举办了一场DREAM挑战赛,让参赛者在数百万个随机启动子DNA序列及其在酵母中实验测定的相应表达水平的数据集上训练模型。为了对模型进行稳健评估,我们设计了一套全面的基准测试,涵盖各种序列类型。所有表现最佳的模型都使用了神经网络,但在架构和训练策略上有所不同。为了剖析架构和训练选择如何影响性能,我们开发了“固定价格”框架,将模型划分为模块化构建块。我们测试了排名前三的模型的所有可能组合,进一步提高了它们的性能。DREAM挑战赛模型不仅在我们全面的酵母数据集上取得了领先成果,而且在果蝇和人类基因组数据集上也始终超越了现有基准,证明了金标准基因组数据集能够推动的进展。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验