基于混合深度卷积神经网络的染色质可及性预测。

Chromatin accessibility prediction via a hybrid deep convolutional neural network.

机构信息

MOE Key Laboratory of Bioinformatics; Bioinformatics Division and Center for Synthetic & Systems Biology, TNLIST; Department of Automation, Tsinghua University, Beijing 100084, China.

Department of Electrical Engineering, Stanford University, Stanford, CA 94305, USA.

出版信息

Bioinformatics. 2018 Mar 1;34(5):732-738. doi: 10.1093/bioinformatics/btx679.

DOI:10.1093/bioinformatics/btx679

PMID:29069282

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6192215/

Abstract

MOTIVATION

A majority of known genetic variants associated with human-inherited diseases lie in non-coding regions that lack adequate interpretation, making it indispensable to systematically discover functional sites at the whole genome level and precisely decipher their implications in a comprehensive manner. Although computational approaches have been complementing high-throughput biological experiments towards the annotation of the human genome, it still remains a big challenge to accurately annotate regulatory elements in the context of a specific cell type via automatic learning of the DNA sequence code from large-scale sequencing data. Indeed, the development of an accurate and interpretable model to learn the DNA sequence signature and further enable the identification of causative genetic variants has become essential in both genomic and genetic studies.

RESULTS

We proposed Deopen, a hybrid framework mainly based on a deep convolutional neural network, to automatically learn the regulatory code of DNA sequences and predict chromatin accessibility. In a series of comparison with existing methods, we show the superior performance of our model in not only the classification of accessible regions against background sequences sampled at random, but also the regression of DNase-seq signals. Besides, we further visualize the convolutional kernels and show the match of identified sequence signatures and known motifs. We finally demonstrate the sensitivity of our model in finding causative noncoding variants in the analysis of a breast cancer dataset. We expect to see wide applications of Deopen with either public or in-house chromatin accessibility data in the annotation of the human genome and the identification of non-coding variants associated with diseases.

AVAILABILITY AND IMPLEMENTATION

Deopen is freely available at https://github.com/kimmo1019/Deopen.

CONTACT

ruijiang@tsinghua.edu.cn.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

大多数与人类遗传性疾病相关的已知遗传变异都位于非编码区域，这些区域缺乏充分的解释，因此系统地在全基因组水平上发现功能位点，并全面准确地破译它们的含义是必不可少的。尽管计算方法一直在补充高通量生物实验，以注释人类基因组，但通过从大规模测序数据中自动学习 DNA 序列代码来准确注释特定细胞类型中的调控元件仍然是一个巨大的挑战。事实上，开发一个准确且可解释的模型来学习 DNA 序列特征，并进一步实现对致病遗传变异的识别，在基因组和遗传学研究中都变得至关重要。

结果

我们提出了 Deopen，这是一个主要基于深度卷积神经网络的混合框架，用于自动学习 DNA 序列的调控代码并预测染色质可及性。在与现有方法的一系列比较中，我们不仅展示了我们的模型在区分可及区域与随机采样的背景序列的分类任务中的优越性能，还展示了其在 DNase-seq 信号回归任务中的优越性能。此外，我们进一步可视化了卷积核，并展示了识别出的序列特征与已知基序的匹配。我们最后通过在乳腺癌数据集的分析中展示了我们的模型在发现致病非编码变异方面的敏感性，证明了我们模型的有效性。我们期望看到 Deopen 在注释人类基因组和识别与疾病相关的非编码变异方面，无论是在公共还是内部染色质可及性数据上都得到广泛应用。

可用性和实现

Deopen 可在 https://github.com/kimmo1019/Deopen 上免费获取。

联系方式

ruijiang@tsinghua.edu.cn。

补充信息

补充数据可在 Bioinformatics 在线获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/682e/6192215/c9514afd08e1/btx679f1.jpg

相似文献

Chromatin accessibility prediction via a hybrid deep convolutional neural network.基于混合深度卷积神经网络的染色质可及性预测。

Bioinformatics. 2018 Mar 1;34(5):732-738. doi: 10.1093/bioinformatics/btx679.

DeepCAGE: Incorporating Transcription Factors in Genome-wide Prediction of Chromatin Accessibility.DeepCAGE：在全基因组预测染色质可及性中纳入转录因子。

Genomics Proteomics Bioinformatics. 2022 Jun;20(3):496-507. doi: 10.1016/j.gpb.2021.08.015. Epub 2022 Mar 12.

DeepCAPE: A Deep Convolutional Neural Network for the Accurate Prediction of Enhancers.深度CAPE：用于准确预测增强子的深度卷积神经网络

Genomics Proteomics Bioinformatics. 2021 Aug;19(4):565-577. doi: 10.1016/j.gpb.2019.04.006. Epub 2021 Feb 11.

Chromatin accessibility prediction via convolutional long short-term memory networks with k-mer embedding.基于 k- -mer 嵌入卷积长短期记忆网络的染色质可及性预测。

Bioinformatics. 2017 Jul 15;33(14):i92-i101. doi: 10.1093/bioinformatics/btx234.

Integrating distal and proximal information to predict gene expression via a densely connected convolutional neural network.通过密集连接卷积神经网络整合远端和近端信息来预测基因表达。

Bioinformatics. 2020 Jan 15;36(2):496-503. doi: 10.1093/bioinformatics/btz562.

Basset: learning the regulatory code of the accessible genome with deep convolutional neural networks.巴塞特：利用深度卷积神经网络学习可及基因组的调控密码。

Genome Res. 2016 Jul;26(7):990-9. doi: 10.1101/gr.200535.115. Epub 2016 May 3.

Predicting enhancers with deep convolutional neural networks.使用深度卷积神经网络预测增强子。

BMC Bioinformatics. 2017 Dec 1;18(Suppl 13):478. doi: 10.1186/s12859-017-1878-3.

DeFCoM: analysis and modeling of transcription factor binding sites using a motif-centric genomic footprinter.DeFCoM：使用以基序为中心的基因组足迹法对转录因子结合位点进行分析和建模。

Bioinformatics. 2017 Apr 1;33(7):956-963. doi: 10.1093/bioinformatics/btw740.

ALTRE: workflow for defining ALTered Regulatory Elements using chromatin accessibility data.ALTRE：使用染色质可及性数据定义改变的调控元件的工作流程。

Bioinformatics. 2017 Mar 1;33(5):740-742. doi: 10.1093/bioinformatics/btw688.

BinDNase: a discriminatory approach for transcription factor binding prediction using DNase I hypersensitivity data.BinDNase：一种利用DNA酶I超敏反应数据进行转录因子结合预测的鉴别方法。

Bioinformatics. 2015 Sep 1;31(17):2852-9. doi: 10.1093/bioinformatics/btv294. Epub 2015 May 7.

引用本文的文献

Predicting gene expression from DNA sequence using deep learning models.使用深度学习模型从DNA序列预测基因表达。

Nat Rev Genet. 2025 May 13. doi: 10.1038/s41576-025-00841-2.

EpiGePT: a pretrained transformer-based language model for context-specific human epigenomics.EpiGePT：一种用于特定背景人类表观基因组学的基于预训练Transformer的语言模型。

Genome Biol. 2024 Dec 18;25(1):310. doi: 10.1186/s13059-024-03449-7.

dHICA: a deep transformer-based model enables accurate histone imputation from chromatin accessibility.dHICA：一种基于深度Transformer 的模型，可从染色质可及性中实现精确的组蛋白推断。

Brief Bioinform. 2024 Sep 23;25(6). doi: 10.1093/bib/bbae459.

ctGAN: combined transformation of gene expression and survival data with generative adversarial network.ctGAN：利用生成对抗网络对基因表达和生存数据进行联合变换。

Brief Bioinform. 2024 May 23;25(4). doi: 10.1093/bib/bbae325.

EACVP: An ESM-2 LM Framework Combined CNN and CBAM Attention to Predict Anti-coronavirus Peptides.EACVP：一种结合卷积神经网络（CNN）和CBAM注意力机制的ESM-2语言模型框架，用于预测抗冠状病毒肽。

Curr Med Chem. 2025;32(10):2040-2054. doi: 10.2174/0109298673287899240303164403.

Machine Learning to Advance Human Genome-Wide Association Studies.机器学习在全基因组关联研究中的应用

Genes (Basel). 2023 Dec 25;15(1):34. doi: 10.3390/genes15010034.

Early detection of hepatocellular carcinoma via no end-repair enzymatic methylation sequencing of cell-free DNA and pre-trained neural network.基于无末端修复酶促甲基化测序的循环游离 DNA 和预训练神经网络早期检测肝细胞癌

Genome Med. 2023 Nov 8;15(1):93. doi: 10.1186/s13073-023-01238-8.

The evolution and mutational robustness of chromatin accessibility in Drosophila.果蝇染色质可及性的进化和突变鲁棒性。

Genome Biol. 2023 Oct 16;24(1):232. doi: 10.1186/s13059-023-03079-5.

Completing Single-Cell DNA Methylome Profiles Transfer Learning Together With KL-Divergence.结合KL散度的迁移学习完成单细胞DNA甲基化组图谱分析

Front Genet. 2022 Jul 22;13:910439. doi: 10.3389/fgene.2022.910439. eCollection 2022.

A review of deep learning applications in human genomics using next-generation sequencing data.深度学习在人类基因组学中应用的研究进展：利用下一代测序数据

Hum Genomics. 2022 Jul 25;16(1):26. doi: 10.1186/s40246-022-00396-x.

本文引用的文献

A sequence-based method to predict the impact of regulatory variants using random forest.一种基于序列的方法，利用随机森林预测调控变异的影响。

BMC Syst Biol. 2017 Mar 14;11(Suppl 2):7. doi: 10.1186/s12918-017-0389-1.

Basset: learning the regulatory code of the accessible genome with deep convolutional neural networks.巴塞特：利用深度卷积神经网络学习可及基因组的调控密码。

Genome Res. 2016 Jul;26(7):990-9. doi: 10.1101/gr.200535.115. Epub 2016 May 3.

DanQ: a hybrid convolutional and recurrent deep neural network for quantifying the function of DNA sequences.DanQ：一种用于量化DNA序列功能的卷积与循环相结合的深度神经网络。

Nucleic Acids Res. 2016 Jun 20;44(11):e107. doi: 10.1093/nar/gkw226. Epub 2016 Apr 15.

JASPAR 2016: a major expansion and update of the open-access database of transcription factor binding profiles.JASPAR 2016：转录因子结合谱开放获取数据库的重大扩展与更新

Nucleic Acids Res. 2016 Jan 4;44(D1):D110-5. doi: 10.1093/nar/gkv1176. Epub 2015 Nov 3.

miR-449a promotes liver cancer cell apoptosis by downregulation of Calpain 6 and POU2F1.微小RNA-449a通过下调钙蛋白酶6和POU2F1促进肝癌细胞凋亡。

Oncotarget. 2016 Mar 22;7(12):13491-501. doi: 10.18632/oncotarget.4821.

Predicting effects of noncoding variants with deep learning-based sequence model.使用基于深度学习的序列模型预测非编码变异的影响。

Nat Methods. 2015 Oct;12(10):931-4. doi: 10.1038/nmeth.3547. Epub 2015 Aug 24.

Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning.通过深度学习预测 DNA 和 RNA 结合蛋白的序列特异性。

Nat Biotechnol. 2015 Aug;33(8):831-8. doi: 10.1038/nbt.3300. Epub 2015 Jul 27.

A method to predict the impact of regulatory variants from DNA sequence.一种从DNA序列预测调控变异影响的方法。

Nat Genet. 2015 Aug;47(8):955-61. doi: 10.1038/ng.3331. Epub 2015 Jun 15.

Predicting the human epigenome from DNA motifs.从DNA基序预测人类表观基因组。

Nat Methods. 2015 Mar;12(3):265-72, 7 p following 272. doi: 10.1038/nmeth.3065. Epub 2014 Sep 21.

Enhanced regulatory sequence prediction using gapped k-mer features.使用带缺口的 k-mer 特征增强调控序列预测。

PLoS Comput Biol. 2014 Jul 17;10(7):e1003711. doi: 10.1371/journal.pcbi.1003711. eCollection 2014 Jul.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

基于混合深度卷积神经网络的染色质可及性预测。

Chromatin accessibility prediction via a hybrid deep convolutional neural network.

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY AND IMPLEMENTATION

CONTACT

SUPPLEMENTARY INFORMATION

动机

结果

可用性和实现

联系方式

补充信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献