• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

目前的基因组深度学习模型在细胞类型特异性可及区域的表现有所下降。

Current genomic deep learning models display decreased performance in cell type-specific accessible regions.

机构信息

Center for Computational Biology, University of California, Berkeley, Berkeley, CA, USA.

Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, Berkeley, CA, USA.

出版信息

Genome Biol. 2024 Aug 1;25(1):202. doi: 10.1186/s13059-024-03335-2.

DOI:10.1186/s13059-024-03335-2
PMID:39090688
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11293111/
Abstract

BACKGROUND

A number of deep learning models have been developed to predict epigenetic features such as chromatin accessibility from DNA sequence. Model evaluations commonly report performance genome-wide; however, cis regulatory elements (CREs), which play critical roles in gene regulation, make up only a small fraction of the genome. Furthermore, cell type-specific CREs contain a large proportion of complex disease heritability.

RESULTS

We evaluate genomic deep learning models in chromatin accessibility regions with varying degrees of cell type specificity. We assess two modeling directions in the field: general purpose models trained across thousands of outputs (cell types and epigenetic marks) and models tailored to specific tissues and tasks. We find that the accuracy of genomic deep learning models, including two state-of-the-art general purpose models-Enformer and Sei-varies across the genome and is reduced in cell type-specific accessible regions. Using accessibility models trained on cell types from specific tissues, we find that increasing model capacity to learn cell type-specific regulatory syntax-through single-task learning or high capacity multi-task models-can improve performance in cell type-specific accessible regions. We also observe that improving reference sequence predictions does not consistently improve variant effect predictions, indicating that novel strategies are needed to improve performance on variants.

CONCLUSIONS

Our results provide a new perspective on the performance of genomic deep learning models, showing that performance varies across the genome and is particularly reduced in cell type-specific accessible regions. We also identify strategies to maximize performance in cell type-specific accessible regions.

摘要

背景

已经开发出许多深度学习模型,可从 DNA 序列预测表观遗传特征,如染色质可及性。模型评估通常在全基因组范围内报告性能;然而,顺式调控元件(CREs)在基因调控中起着关键作用,仅占基因组的一小部分。此外,细胞类型特异性 CREs 包含大量复杂疾病遗传率。

结果

我们评估了具有不同细胞类型特异性程度的染色质可及性区域的基因组深度学习模型。我们评估了该领域的两种建模方向:跨数千个输出(细胞类型和表观遗传标记)训练的通用模型和针对特定组织和任务的模型。我们发现,基因组深度学习模型的准确性,包括两种最先进的通用模型-Enformer 和 Sei-在整个基因组中各不相同,并且在细胞类型特异性可及区域中降低。使用来自特定组织的细胞类型训练的可及性模型,我们发现通过单任务学习或大容量多任务模型增加模型学习细胞类型特异性调节语法的能力可以提高细胞类型特异性可及区域的性能。我们还观察到,改进参考序列预测并不总是能提高变异效应预测,这表明需要新的策略来提高变体性能。

结论

我们的结果提供了基因组深度学习模型性能的新视角,表明性能在整个基因组中各不相同,特别是在细胞类型特异性可及区域中降低。我们还确定了在细胞类型特异性可及区域中最大化性能的策略。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eb02/11293111/f6cc63f7fa59/13059_2024_3335_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eb02/11293111/995fa563e162/13059_2024_3335_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eb02/11293111/ec3e27bf4156/13059_2024_3335_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eb02/11293111/735e4799649a/13059_2024_3335_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eb02/11293111/f6cc63f7fa59/13059_2024_3335_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eb02/11293111/995fa563e162/13059_2024_3335_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eb02/11293111/ec3e27bf4156/13059_2024_3335_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eb02/11293111/735e4799649a/13059_2024_3335_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eb02/11293111/f6cc63f7fa59/13059_2024_3335_Fig4_HTML.jpg

相似文献

1
Current genomic deep learning models display decreased performance in cell type-specific accessible regions.目前的基因组深度学习模型在细胞类型特异性可及区域的表现有所下降。
Genome Biol. 2024 Aug 1;25(1):202. doi: 10.1186/s13059-024-03335-2.
2
Current genomic deep learning models display decreased performance in cell type specific accessible regions.当前的基因组深度学习模型在细胞类型特异性可及区域表现出性能下降。
bioRxiv. 2024 Jul 10:2024.07.05.602265. doi: 10.1101/2024.07.05.602265.
3
Prescription of Controlled Substances: Benefits and Risks管制药品的处方:益处与风险
4
Short-Term Memory Impairment短期记忆障碍
5
A deep learning approach to direct immunofluorescence pattern recognition in autoimmune bullous diseases.深度学习方法在自身免疫性大疱性疾病中的直接免疫荧光模式识别。
Br J Dermatol. 2024 Jul 16;191(2):261-266. doi: 10.1093/bjd/ljae142.
6
Deep learning analyses of splicing variants identify the link of PCP4 with amyotrophic lateral sclerosis.剪接变体的深度学习分析确定了PCP4与肌萎缩侧索硬化症之间的联系。
Brain. 2025 Jul 7;148(7):2331-2347. doi: 10.1093/brain/awaf025.
7
Comparison of Two Modern Survival Prediction Tools, SORG-MLA and METSSS, in Patients With Symptomatic Long-bone Metastases Who Underwent Local Treatment With Surgery Followed by Radiotherapy and With Radiotherapy Alone.两种现代生存预测工具 SORG-MLA 和 METSSS 在接受手术联合放疗和单纯放疗治疗有症状长骨转移患者中的比较。
Clin Orthop Relat Res. 2024 Dec 1;482(12):2193-2208. doi: 10.1097/CORR.0000000000003185. Epub 2024 Jul 23.
8
Predicting cognitive decline: Deep-learning reveals subtle brain changes in pre-MCI stage.预测认知衰退:深度学习揭示轻度认知障碍前阶段大脑的细微变化。
J Prev Alzheimers Dis. 2025 May;12(5):100079. doi: 10.1016/j.tjpad.2025.100079. Epub 2025 Feb 6.
9
TECM-ChI: A TECM network-based method for chromatin interaction prediction.TECM-ChI:一种基于TECM网络的染色质相互作用预测方法。
Gene. 2025 Sep 15;965:149656. doi: 10.1016/j.gene.2025.149656. Epub 2025 Jul 11.
10
Can a Liquid Biopsy Detect Circulating Tumor DNA With Low-passage Whole-genome Sequencing in Patients With a Sarcoma? A Pilot Evaluation.液体活检能否通过低深度全基因组测序检测肉瘤患者的循环肿瘤DNA?一项初步评估。
Clin Orthop Relat Res. 2025 Jan 1;483(1):39-48. doi: 10.1097/CORR.0000000000003161. Epub 2024 Jun 21.

引用本文的文献

1
Perspective on recent developments and challenges in regulatory and systems genomics.监管与系统基因组学的最新进展及挑战之展望
Bioinform Adv. 2025 May 9;5(1):vbaf106. doi: 10.1093/bioadv/vbaf106. eCollection 2025.
2
Predicting gene expression from DNA sequence using deep learning models.使用深度学习模型从DNA序列预测基因表达。
Nat Rev Genet. 2025 May 13. doi: 10.1038/s41576-025-00841-2.
3
Integrating Single-Molecule Sequencing and Deep Learning to Predict Haplotype-Specific 3D Chromatin Organization in a Mendelian Condition.

本文引用的文献

1
Personal transcriptome variation is poorly explained by current genomic deep learning models.当前的基因组深度学习模型对个体转录组变异的解释能力较差。
Nat Genet. 2023 Dec;55(12):2056-2059. doi: 10.1038/s41588-023-01574-w. Epub 2023 Nov 30.
2
Benchmarking of deep neural networks for predicting personal gene expression from DNA sequence highlights shortcomings.用于从DNA序列预测个人基因表达的深度神经网络基准测试凸显了不足之处。
Nat Genet. 2023 Dec;55(12):2060-2064. doi: 10.1038/s41588-023-01524-6. Epub 2023 Nov 30.
3
Evaluating deep learning for predicting epigenomic profiles.
整合单分子测序和深度学习以预测孟德尔遗传病中特定单倍型的三维染色质结构
bioRxiv. 2025 Mar 20:2025.02.26.640261. doi: 10.1101/2025.02.26.640261.
4
Iterative improvement of deep learning models using synthetic regulatory genomics.利用合成调控基因组学对深度学习模型进行迭代改进。
bioRxiv. 2025 Feb 21:2025.02.04.636130. doi: 10.1101/2025.02.04.636130.
5
Current genomic deep learning models display decreased performance in cell type specific accessible regions.当前的基因组深度学习模型在细胞类型特异性可及区域表现出性能下降。
bioRxiv. 2024 Jul 10:2024.07.05.602265. doi: 10.1101/2024.07.05.602265.
评估用于预测表观基因组图谱的深度学习。
Nat Mach Intell. 2022 Dec;4(12):1088-1100. doi: 10.1038/s42256-022-00570-9. Epub 2022 Dec 5.
4
A simple new approach to variable selection in regression, with application to genetic fine mapping.一种用于回归中变量选择的简单新方法及其在基因精细定位中的应用。
J R Stat Soc Series B Stat Methodol. 2020 Dec;82(5):1273-1300. doi: 10.1111/rssb.12388. Epub 2020 Jul 10.
5
Current sequence-based models capture gene expression determinants in promoters but mostly ignore distal enhancers.目前基于序列的模型可以捕捉启动子中的基因表达决定因素,但大多忽略了远端增强子。
Genome Biol. 2023 Mar 27;24(1):56. doi: 10.1186/s13059-023-02899-9.
6
Integrative single-cell analysis of cardiogenesis identifies developmental trajectories and non-coding mutations in congenital heart disease.心脏发生的综合单细胞分析确定了先天性心脏病的发育轨迹和非编码突变。
Cell. 2022 Dec 22;185(26):4937-4953.e23. doi: 10.1016/j.cell.2022.11.028.
7
Single-cell multiome of the human retina and deep learning nominate causal variants in complex eye diseases.人类视网膜的单细胞多组学与深度学习确定复杂眼病的因果变异
Cell Genom. 2022 Aug 10;2(8). doi: 10.1016/j.xgen.2022.100164. Epub 2022 Jul 27.
8
scBasset: sequence-based modeling of single-cell ATAC-seq using convolutional neural networks.scBasset:基于序列的单细胞 ATAC-seq 卷积神经网络建模。
Nat Methods. 2022 Sep;19(9):1088-1096. doi: 10.1038/s41592-022-01562-8. Epub 2022 Aug 8.
9
A sequence-based global map of regulatory activity for deciphering human genetics.基于序列的人类遗传学解码调控活性的全局图谱。
Nat Genet. 2022 Jul;54(7):940-949. doi: 10.1038/s41588-022-01102-2. Epub 2022 Jul 11.
10
JASPAR 2022: the 9th release of the open-access database of transcription factor binding profiles.JASPAR 2022:转录因子结合谱开放获取数据库的第 9 个版本。
Nucleic Acids Res. 2022 Jan 7;50(D1):D165-D173. doi: 10.1093/nar/gkab1113.