评估用于预测表观基因组图谱的深度学习。

Evaluating deep learning for predicting epigenomic profiles.

作者信息

Toneyan Shushan, Tang Ziqi, Koo Peter K

机构信息

Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA.

出版信息

Nat Mach Intell. 2022 Dec;4(12):1088-1100. doi: 10.1038/s42256-022-00570-9. Epub 2022 Dec 5.

DOI:10.1038/s42256-022-00570-9

PMID:37324054

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10270674/

Abstract

Deep learning has been successful at predicting epigenomic profiles from DNA sequences. Most approaches frame this task as a binary classification relying on peak callers to define functional activity. Recently, quantitative models have emerged to directly predict the experimental coverage values as a regression. As new models continue to emerge with different architectures and training configurations, a major bottleneck is forming due to the lack of ability to fairly assess the novelty of proposed models and their utility for downstream biological discovery. Here we introduce a unified evaluation framework and use it to compare various binary and quantitative models trained to predict chromatin accessibility data. We highlight various modeling choices that affect generalization performance, including a downstream application of predicting variant effects. In addition, we introduce a robustness metric that can be used to enhance model selection and improve variant effect predictions. Our empirical study largely supports that quantitative modeling of epigenomic profiles leads to better generalizability and interpretability.

摘要

深度学习在从DNA序列预测表观基因组图谱方面取得了成功。大多数方法将此任务视为基于峰检测工具来定义功能活性的二分类问题。最近，定量模型已出现，可直接将实验覆盖值预测为回归问题。随着具有不同架构和训练配置的新模型不断涌现，由于缺乏公平评估所提出模型的新颖性及其对下游生物学发现的效用的能力，一个主要瓶颈正在形成。在此，我们引入了一个统一的评估框架，并使用它来比较为预测染色质可及性数据而训练的各种二分类和定量模型。我们强调了各种影响泛化性能的建模选择，包括预测变异效应的下游应用。此外，我们引入了一种稳健性度量，可用于加强模型选择并改进变异效应预测。我们的实证研究在很大程度上支持表观基因组图谱的定量建模可带来更好的泛化性和可解释性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1203/10270674/107d15413569/nihms-1895201-f0007.jpg

相似文献

Evaluating deep learning for predicting epigenomic profiles.评估用于预测表观基因组图谱的深度学习。

Nat Mach Intell. 2022 Dec;4(12):1088-1100. doi: 10.1038/s42256-022-00570-9. Epub 2022 Dec 5.

GeneralizedDTA: combining pre-training and multi-task learning to predict drug-target binding affinity for unknown drug discovery.通用 DTA：结合预训练和多任务学习，预测未知药物发现的药物-靶标结合亲和力。

BMC Bioinformatics. 2022 Sep 7;23(1):367. doi: 10.1186/s12859-022-04905-6.

DeepHistone: a deep learning approach to predicting histone modifications.DeepHistone：一种深度学习方法，用于预测组蛋白修饰。

BMC Genomics. 2019 Apr 4;20(Suppl 2):193. doi: 10.1186/s12864-019-5489-4.

Adaptive Hierarchical Similarity Metric Learning With Noisy Labels.带噪标签的自适应层次相似性度量学习。

IEEE Trans Image Process. 2023;32:1245-1256. doi: 10.1109/TIP.2023.3242148.

Multi-PLI: interpretable multi-task deep learning model for unifying protein-ligand interaction datasets.Multi-PLI：用于统一蛋白质-配体相互作用数据集的可解释多任务深度学习模型。

J Cheminform. 2021 Apr 15;13(1):30. doi: 10.1186/s13321-021-00510-6.

Comparative Study of Deep Generative Models on Chemical Space Coverage.化学空间覆盖的深度生成模型比较研究。

J Chem Inf Model. 2021 Jun 28;61(6):2572-2581. doi: 10.1021/acs.jcim.0c01328. Epub 2021 May 20.

Opening up the blackbox: an interpretable deep neural network-based classifier for cell-type specific enhancer predictions.打开黑箱：一种基于可解释深度神经网络的细胞类型特异性增强子预测分类器。

BMC Syst Biol. 2016 Aug 1;10 Suppl 2(Suppl 2):54. doi: 10.1186/s12918-016-0302-3.

Interpretability-Guided Inductive Bias For Deep Learning Based Medical Image.基于深度学习的医学图像可解释性引导归纳偏置

Med Image Anal. 2022 Oct;81:102551. doi: 10.1016/j.media.2022.102551. Epub 2022 Jul 22.

Deep learning models for predicting the survival of patients with chondrosarcoma based on a surveillance, epidemiology, and end results analysis.基于监测、流行病学和最终结果分析的预测软骨肉瘤患者生存率的深度学习模型。

Front Oncol. 2022 Aug 22;12:967758. doi: 10.3389/fonc.2022.967758. eCollection 2022.

The role of unpaired image-to-image translation for stain color normalization in colorectal cancer histology classification.非配对图像到图像翻译在结直肠癌组织学分类中用于染色颜色归一化的作用。

Comput Methods Programs Biomed. 2023 Jun;234:107511. doi: 10.1016/j.cmpb.2023.107511. Epub 2023 Mar 26.

引用本文的文献

Machine learning tools for deciphering the regulatory logic of enhancers in health and disease.用于解读健康与疾病中增强子调控逻辑的机器学习工具

Front Genet. 2025 Aug 13;16:1603687. doi: 10.3389/fgene.2025.1603687. eCollection 2025.

Base-resolution binding profile prediction of proteins on RNAs with deep learning.利用深度学习预测蛋白质在RNA上的碱基分辨率结合图谱。

Nucleic Acids Res. 2025 Jul 19;53(14). doi: 10.1093/nar/gkaf748.

Evaluating the representational power of pre-trained DNA language models for regulatory genomics.评估预训练DNA语言模型在调控基因组学中的表征能力。

Genome Biol. 2025 Jul 14;26(1):203. doi: 10.1186/s13059-025-03674-8.

Perspective on recent developments and challenges in regulatory and systems genomics.监管与系统基因组学的最新进展及挑战之展望

Bioinform Adv. 2025 May 9;5(1):vbaf106. doi: 10.1093/bioadv/vbaf106. eCollection 2025.

Integrative omics analysis reveals the genetic basis of fatty acid composition in Brassica napus seeds.综合组学分析揭示了甘蓝型油菜种子中脂肪酸组成的遗传基础。

Genome Biol. 2025 Apr 2;26(1):83. doi: 10.1186/s13059-025-03558-x.

Gauge fixing for sequence-function relationships.序列-功能关系的规范固定

PLoS Comput Biol. 2025 Mar 20;21(3):e1012818. doi: 10.1371/journal.pcbi.1012818. eCollection 2025.

Iterative improvement of deep learning models using synthetic regulatory genomics.利用合成调控基因组学对深度学习模型进行迭代改进。

bioRxiv. 2025 Feb 21:2025.02.04.636130. doi: 10.1101/2025.02.04.636130.

Interpreting -regulatory mechanisms from genomic deep neural networks using surrogate models.使用替代模型从基因组深度神经网络解释调控机制。

Nat Mach Intell. 2024 Jun;6(6):701-713. doi: 10.1038/s42256-024-00851-5. Epub 2024 Jun 21.

Advancing Regulatory Genomics With Machine Learning.利用机器学习推动监管基因组学发展。

Bioinform Biol Insights. 2024 Dec 24;18:11779322241249562. doi: 10.1177/11779322241249562. eCollection 2024.

Predicting gene expression from histone marks using chromatin deep learning models depends on histone mark function, regulatory distance and cellular states.使用染色质深度学习模型从组蛋白标记预测基因表达取决于组蛋白标记功能、调控距离和细胞状态。

Nucleic Acids Res. 2025 Feb 8;53(4). doi: 10.1093/nar/gkae1212.

本文引用的文献

A sequence-based global map of regulatory activity for deciphering human genetics.基于序列的人类遗传学解码调控活性的全局图谱。

Nat Genet. 2022 Jul;54(7):940-949. doi: 10.1038/s41588-022-01102-2. Epub 2022 Jul 11.

Sequence-based modeling of three-dimensional genome architecture from kilobase to chromosome scale.基于序列的从千碱基到染色体尺度的三维基因组结构建模。

Nat Genet. 2022 May;54(5):725-734. doi: 10.1038/s41588-022-01065-4. Epub 2022 May 12.

DeepSTARR predicts enhancer activity from DNA sequence and enables the de novo design of synthetic enhancers.DeepSTARR 可根据 DNA 序列预测增强子活性，并能够从头设计合成增强子。

Nat Genet. 2022 May;54(5):613-624. doi: 10.1038/s41588-022-01048-5. Epub 2022 May 12.

Chromatin interaction-aware gene regulatory modeling with graph attention networks.基于图注意力网络的染色质相互作用感知基因调控建模。

Genome Res. 2022 May;32(5):930-944. doi: 10.1101/gr.275870.121. Epub 2022 Apr 8.

The evolution, evolvability and engineering of gene regulatory DNA.基因调控 DNA 的进化、可进化性与工程。

Nature. 2022 Mar;603(7901):455-463. doi: 10.1038/s41586-022-04506-6. Epub 2022 Mar 9.

Decoding gene regulation in the fly brain.解析果蝇大脑中的基因调控。

Nature. 2022 Jan;601(7894):630-636. doi: 10.1038/s41586-021-04262-z. Epub 2022 Jan 5.

Analysis of long and short enhancers in melanoma cell states.分析黑色素瘤细胞状态中的长增强子和短增强子。

Elife. 2021 Dec 7;10:e71735. doi: 10.7554/eLife.71735.

JASPAR 2022: the 9th release of the open-access database of transcription factor binding profiles.JASPAR 2022：转录因子结合谱开放获取数据库的第 9 个版本。

Nucleic Acids Res. 2022 Jan 7;50(D1):D165-D173. doi: 10.1093/nar/gkab1113.

The dynamic, combinatorial cis-regulatory lexicon of epidermal differentiation.表皮分化的动态、组合顺式调控词汇。

Nat Genet. 2021 Nov;53(11):1564-1576. doi: 10.1038/s41588-021-00947-3. Epub 2021 Oct 14.

Effective gene expression prediction from sequence by integrating long-range interactions.通过整合长程相互作用，从序列中有效预测基因表达。

Nat Methods. 2021 Oct;18(10):1196-1203. doi: 10.1038/s41592-021-01252-x. Epub 2021 Oct 4.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验