Suppr超能文献

用于破译和生成噬菌体基因组的长语境语言模型。

A long-context language model for deciphering and generating bacteriophage genomes.

机构信息

Advanced Research Institute of Multidisciplinary Science, Beijing Institute of Technology, Beijing, 100081, China.

Department of Molecular and Cellular Biology, Harvard University, Cambridge, MA, 02138, USA.

出版信息

Nat Commun. 2024 Oct 30;15(1):9392. doi: 10.1038/s41467-024-53759-4.

Abstract

Inspired by the success of large language models (LLMs), we develop a long-context generative model for genomes. Our multiscale transformer model, megaDNA, is pre-trained on unannotated bacteriophage genomes with nucleotide-level tokenization. We demonstrate the foundational capabilities of our model including the prediction of essential genes, genetic variant effects, regulatory element activity and taxonomy of unannotated sequences. Furthermore, it generates de novo sequences up to 96 K base pairs, which contain potential regulatory elements and annotated proteins with phage-related functions.

摘要

受大型语言模型 (LLM) 的成功启发,我们为基因组开发了一种长语境生成模型。我们的多尺度变换模型 megaDNA 以核苷酸级别的标记化方式在未注释的噬菌体基因组上进行预训练。我们展示了我们模型的基础能力,包括预测必需基因、遗传变异效应、调控元件活性和未注释序列的分类学。此外,它还生成长达 96kb 的从头序列,其中包含潜在的调控元件和具有噬菌体相关功能的注释蛋白。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3dd1/11525655/d4673651d333/41467_2024_53759_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验