• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

将DNA序列预测RNA测序覆盖度作为基因调控的统一模型。

Predicting RNA-seq coverage from DNA sequence as a unifying model of gene regulation.

作者信息

Linder Johannes, Srivastava Divyanshi, Yuan Han, Agarwal Vikram, Kelley David R

机构信息

Calico Life Sciences LLC, South San Francisco, CA, USA.

mRNA Center of Excellence, Sanofi Pasteur Inc., Cambridge, MA, USA.

出版信息

Nat Genet. 2025 Apr;57(4):949-961. doi: 10.1038/s41588-024-02053-6. Epub 2025 Jan 8.

DOI:10.1038/s41588-024-02053-6
PMID:39779956
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11985352/
Abstract

Sequence-based machine-learning models trained on genomics data improve genetic variant interpretation by providing functional predictions describing their impact on the cis-regulatory code. However, current tools do not predict RNA-seq expression profiles because of modeling challenges. Here, we introduce Borzoi, a model that learns to predict cell-type-specific and tissue-specific RNA-seq coverage from DNA sequence. Using statistics derived from Borzoi's predicted coverage, we isolate and accurately score DNA variant effects across multiple layers of regulation, including transcription, splicing and polyadenylation. Evaluated on quantitative trait loci, Borzoi is competitive with and often outperforms state-of-the-art models trained on individual regulatory functions. By applying attribution methods to the derived statistics, we extract cis-regulatory motifs driving RNA expression and post-transcriptional regulation in normal tissues. The wide availability of RNA-seq data across species, conditions and assays profiling specific aspects of regulation emphasizes the potential of this approach to decipher the mapping from DNA sequence to regulatory function.

摘要

基于基因组学数据训练的基于序列的机器学习模型,通过提供描述其对顺式调控代码影响的功能预测,改进了遗传变异解释。然而,由于建模挑战,当前工具无法预测RNA测序表达谱。在此,我们引入了Borzoi,这是一种能从DNA序列中学习预测细胞类型特异性和组织特异性RNA测序覆盖度的模型。利用从Borzoi预测覆盖度得出的统计数据,我们在包括转录、剪接和多聚腺苷酸化在内的多层调控中分离并准确评估DNA变异效应。在数量性状基因座上进行评估时,Borzoi与基于个体调控功能训练的最先进模型具有竞争力,且常常表现更优。通过将归因方法应用于得出的统计数据,我们提取了驱动正常组织中RNA表达和转录后调控的顺式调控基序。跨物种、条件和检测特定调控方面的RNA测序数据的广泛可得性,凸显了这种方法在破译从DNA序列到调控功能映射方面的潜力。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dedc/11985352/25e154ee158d/41588_2024_2053_Fig16_ESM.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dedc/11985352/ba795f904c05/41588_2024_2053_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dedc/11985352/5930063ecdef/41588_2024_2053_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dedc/11985352/a13de7dbab38/41588_2024_2053_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dedc/11985352/aaee4ec7a3f4/41588_2024_2053_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dedc/11985352/87b772527c9a/41588_2024_2053_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dedc/11985352/25b2cd54a64e/41588_2024_2053_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dedc/11985352/7f293f052cc0/41588_2024_2053_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dedc/11985352/fda10f335351/41588_2024_2053_Fig8_ESM.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dedc/11985352/2a77348b3a5e/41588_2024_2053_Fig9_ESM.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dedc/11985352/a6300290272a/41588_2024_2053_Fig10_ESM.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dedc/11985352/70dfe1f6c934/41588_2024_2053_Fig11_ESM.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dedc/11985352/4acfd0745dbc/41588_2024_2053_Fig12_ESM.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dedc/11985352/c63e35f2f181/41588_2024_2053_Fig13_ESM.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dedc/11985352/0249b67c451f/41588_2024_2053_Fig14_ESM.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dedc/11985352/6c608bfa5055/41588_2024_2053_Fig15_ESM.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dedc/11985352/25e154ee158d/41588_2024_2053_Fig16_ESM.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dedc/11985352/ba795f904c05/41588_2024_2053_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dedc/11985352/5930063ecdef/41588_2024_2053_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dedc/11985352/a13de7dbab38/41588_2024_2053_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dedc/11985352/aaee4ec7a3f4/41588_2024_2053_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dedc/11985352/87b772527c9a/41588_2024_2053_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dedc/11985352/25b2cd54a64e/41588_2024_2053_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dedc/11985352/7f293f052cc0/41588_2024_2053_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dedc/11985352/fda10f335351/41588_2024_2053_Fig8_ESM.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dedc/11985352/2a77348b3a5e/41588_2024_2053_Fig9_ESM.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dedc/11985352/a6300290272a/41588_2024_2053_Fig10_ESM.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dedc/11985352/70dfe1f6c934/41588_2024_2053_Fig11_ESM.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dedc/11985352/4acfd0745dbc/41588_2024_2053_Fig12_ESM.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dedc/11985352/c63e35f2f181/41588_2024_2053_Fig13_ESM.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dedc/11985352/0249b67c451f/41588_2024_2053_Fig14_ESM.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dedc/11985352/6c608bfa5055/41588_2024_2053_Fig15_ESM.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dedc/11985352/25e154ee158d/41588_2024_2053_Fig16_ESM.jpg

相似文献

1
Predicting RNA-seq coverage from DNA sequence as a unifying model of gene regulation.将DNA序列预测RNA测序覆盖度作为基因调控的统一模型。
Nat Genet. 2025 Apr;57(4):949-961. doi: 10.1038/s41588-024-02053-6. Epub 2025 Jan 8.
2
A statistical framework for eQTL mapping using RNA-seq data.一种使用RNA测序数据进行eQTL定位的统计框架。
Biometrics. 2012 Mar;68(1):1-11. doi: 10.1111/j.1541-0420.2011.01654.x. Epub 2011 Aug 12.
3
Machine learning-optimized targeted detection of alternative splicing.机器学习优化的可变剪接靶向检测
Nucleic Acids Res. 2025 Jan 24;53(3). doi: 10.1093/nar/gkae1260.
4
A unified framework for cell-type-specific eQTL prioritization by integrating bulk and scRNA-seq data.通过整合批量和单细胞RNA测序数据进行细胞类型特异性表达数量性状基因座优先级排序的统一框架。
Am J Hum Genet. 2025 Feb 6;112(2):332-352. doi: 10.1016/j.ajhg.2024.12.018. Epub 2025 Jan 16.
5
A Deep Neural Network for Predicting and Engineering Alternative Polyadenylation.一种用于预测和工程可变多聚腺苷酸化的深度神经网络。
Cell. 2019 Jun 27;178(1):91-106.e23. doi: 10.1016/j.cell.2019.04.046. Epub 2019 Jun 6.
6
A Survey on Methods for Predicting Polyadenylation Sites from DNA Sequences, Bulk RNA-seq, and Single-cell RNA-seq.基于 DNA 序列、bulk RNA-seq 和单细胞 RNA-seq 预测多聚腺苷酸化位点的方法综述
Genomics Proteomics Bioinformatics. 2023 Feb;21(1):67-83. doi: 10.1016/j.gpb.2022.09.005. Epub 2022 Sep 24.
7
Comprehensively evaluating cis-regulatory variation in the human prostate transcriptome by using gene-level allele-specific expression.通过基因水平的等位基因特异性表达全面评估人类前列腺转录组中的顺式调控变异。
Am J Hum Genet. 2015 Jun 4;96(6):869-82. doi: 10.1016/j.ajhg.2015.04.015. Epub 2015 May 14.
8
ASEP: Gene-based detection of allele-specific expression across individuals in a population by RNA sequencing.ASEP:通过 RNA 测序在人群中的个体之间进行基于基因的等位基因特异性表达检测。
PLoS Genet. 2020 May 11;16(5):e1008786. doi: 10.1371/journal.pgen.1008786. eCollection 2020 May.
9
InPAS: An R/Bioconductor Package for Identifying Novel Polyadenylation Sites and Alternative Polyadenylation from Bulk RNA-seq Data.InPAS:一个用于从大量RNA测序数据中识别新型聚腺苷酸化位点和可变聚腺苷酸化的R/Bioconductor软件包。
Front Biosci (Schol Ed). 2024 Dec 17;16(4):21. doi: 10.31083/j.fbs1604021.
10
GRLGRN: graph representation-based learning to infer gene regulatory networks from single-cell RNA-seq data.GRLGRN:基于图表示的学习方法,用于从单细胞RNA测序数据推断基因调控网络。
BMC Bioinformatics. 2025 Apr 18;26(1):108. doi: 10.1186/s12859-025-06116-1.

引用本文的文献

1
Language Modelling Techniques for Analysing the Impact of Human Genetic Variation.用于分析人类基因变异影响的语言建模技术
Bioinform Biol Insights. 2025 Sep 2;19:11779322251358314. doi: 10.1177/11779322251358314. eCollection 2025.
2
Pre-training Genomic Language Model with Variants for Better Modeling Functional Genomics.使用变异体预训练基因组语言模型以更好地建模功能基因组学。
bioRxiv. 2025 Aug 23:2025.02.26.640468. doi: 10.1101/2025.02.26.640468.
3
Pretraining Improves Prediction of Genomic Datasets Across Species.预训练可提高跨物种基因组数据集的预测能力。

本文引用的文献

1
Multiplexed single-cell characterization of alternative polyadenylation regulators.多指标单细胞鉴定可变聚腺苷酸化调控因子。
Cell. 2024 Aug 8;187(16):4408-4425.e23. doi: 10.1016/j.cell.2024.06.005. Epub 2024 Jun 25.
2
A genomic mutational constraint map using variation in 76,156 human genomes.基于 76156 个人类基因组的变异,绘制出基因组突变约束图谱。
Nature. 2024 Jan;625(7993):92-100. doi: 10.1038/s41586-023-06045-0. Epub 2023 Dec 6.
3
Personal transcriptome variation is poorly explained by current genomic deep learning models.
bioRxiv. 2025 Aug 24:2025.08.20.671362. doi: 10.1101/2025.08.20.671362.
4
Multimodal learning decodes the global binding landscape of chromatin-associated proteins.多模态学习解码染色质相关蛋白的全局结合图谱。
bioRxiv. 2025 Aug 17:2025.08.17.670761. doi: 10.1101/2025.08.17.670761.
5
Beyond AlphaFold: how AI is decoding the grammar of the genome.超越阿尔法折叠:人工智能如何解码基因组的语法
Nature. 2025 Aug;644(8077):829-832. doi: 10.1038/d41586-025-02621-8.
6
Tokenization and deep learning architectures in genomics: A comprehensive review.基因组学中的词法分析与深度学习架构:全面综述
Comput Struct Biotechnol J. 2025 Jul 28;27:3547-3555. doi: 10.1016/j.csbj.2025.07.038. eCollection 2025.
7
In silico prediction of variant effects: promises and limitations for precision plant breeding.变异效应的计算机模拟预测:精准植物育种的前景与局限
Theor Appl Genet. 2025 Jul 28;138(8):193. doi: 10.1007/s00122-025-04973-1.
8
Predicting the translation efficiency of messenger RNA in mammalian cells.预测哺乳动物细胞中信使核糖核酸的翻译效率。
Nat Biotechnol. 2025 Jul 25. doi: 10.1038/s41587-025-02712-x.
9
Medical laboratory data-based models: opportunities, obstacles, and solutions.基于医学实验室数据的模型:机遇、障碍与解决方案。
J Transl Med. 2025 Jul 24;23(1):823. doi: 10.1186/s12967-025-06802-x.
10
GAME: Genomic API for Model Evaluation.GAME:用于模型评估的基因组应用程序编程接口。
bioRxiv. 2025 Jul 8:2025.07.04.663250. doi: 10.1101/2025.07.04.663250.
当前的基因组深度学习模型对个体转录组变异的解释能力较差。
Nat Genet. 2023 Dec;55(12):2056-2059. doi: 10.1038/s41588-023-01574-w. Epub 2023 Nov 30.
4
Benchmarking of deep neural networks for predicting personal gene expression from DNA sequence highlights shortcomings.用于从DNA序列预测个人基因表达的深度神经网络基准测试凸显了不足之处。
Nat Genet. 2023 Dec;55(12):2060-2064. doi: 10.1038/s41588-023-01524-6. Epub 2023 Nov 30.
5
A comparative atlas of single-cell chromatin accessibility in the human brain.人类大脑单细胞染色质可及性比较图谱。
Science. 2023 Oct 13;382(6667):eadf7044. doi: 10.1126/science.adf7044.
6
Transcription factors interact with RNA to regulate genes.转录因子与 RNA 相互作用以调节基因。
Mol Cell. 2023 Jul 20;83(14):2449-2463.e13. doi: 10.1016/j.molcel.2023.06.012. Epub 2023 Jul 3.
7
An Atlas of Variant Effects to understand the genome at nucleotide resolution.变异效应图谱,深入解析基因组核苷酸分辨率。
Genome Biol. 2023 Jul 3;24(1):147. doi: 10.1186/s13059-023-02986-x.
8
Correcting gradient-based interpretations of deep neural networks for genomics.纠正基于梯度的深度学习神经网络在基因组学中的解释。
Genome Biol. 2023 May 9;24(1):109. doi: 10.1186/s13059-023-02956-3.
9
Relating enhancer genetic variation across mammals to complex phenotypes using machine learning.利用机器学习将增强子遗传变异与哺乳动物的复杂表型联系起来。
Science. 2023 Apr 28;380(6643):eabm7993. doi: 10.1126/science.abm7993.
10
Current sequence-based models capture gene expression determinants in promoters but mostly ignore distal enhancers.目前基于序列的模型可以捕捉启动子中的基因表达决定因素,但大多忽略了远端增强子。
Genome Biol. 2023 Mar 27;24(1):56. doi: 10.1186/s13059-023-02899-9.