Suppr超能文献

基于混合深度卷积神经网络的染色质可及性预测。

Chromatin accessibility prediction via a hybrid deep convolutional neural network.

机构信息

MOE Key Laboratory of Bioinformatics; Bioinformatics Division and Center for Synthetic & Systems Biology, TNLIST; Department of Automation, Tsinghua University, Beijing 100084, China.

Department of Electrical Engineering, Stanford University, Stanford, CA 94305, USA.

出版信息

Bioinformatics. 2018 Mar 1;34(5):732-738. doi: 10.1093/bioinformatics/btx679.

Abstract

MOTIVATION

A majority of known genetic variants associated with human-inherited diseases lie in non-coding regions that lack adequate interpretation, making it indispensable to systematically discover functional sites at the whole genome level and precisely decipher their implications in a comprehensive manner. Although computational approaches have been complementing high-throughput biological experiments towards the annotation of the human genome, it still remains a big challenge to accurately annotate regulatory elements in the context of a specific cell type via automatic learning of the DNA sequence code from large-scale sequencing data. Indeed, the development of an accurate and interpretable model to learn the DNA sequence signature and further enable the identification of causative genetic variants has become essential in both genomic and genetic studies.

RESULTS

We proposed Deopen, a hybrid framework mainly based on a deep convolutional neural network, to automatically learn the regulatory code of DNA sequences and predict chromatin accessibility. In a series of comparison with existing methods, we show the superior performance of our model in not only the classification of accessible regions against background sequences sampled at random, but also the regression of DNase-seq signals. Besides, we further visualize the convolutional kernels and show the match of identified sequence signatures and known motifs. We finally demonstrate the sensitivity of our model in finding causative noncoding variants in the analysis of a breast cancer dataset. We expect to see wide applications of Deopen with either public or in-house chromatin accessibility data in the annotation of the human genome and the identification of non-coding variants associated with diseases.

AVAILABILITY AND IMPLEMENTATION

Deopen is freely available at https://github.com/kimmo1019/Deopen.

CONTACT

ruijiang@tsinghua.edu.cn.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

大多数与人类遗传性疾病相关的已知遗传变异都位于非编码区域,这些区域缺乏充分的解释,因此系统地在全基因组水平上发现功能位点,并全面准确地破译它们的含义是必不可少的。尽管计算方法一直在补充高通量生物实验,以注释人类基因组,但通过从大规模测序数据中自动学习 DNA 序列代码来准确注释特定细胞类型中的调控元件仍然是一个巨大的挑战。事实上,开发一个准确且可解释的模型来学习 DNA 序列特征,并进一步实现对致病遗传变异的识别,在基因组和遗传学研究中都变得至关重要。

结果

我们提出了 Deopen,这是一个主要基于深度卷积神经网络的混合框架,用于自动学习 DNA 序列的调控代码并预测染色质可及性。在与现有方法的一系列比较中,我们不仅展示了我们的模型在区分可及区域与随机采样的背景序列的分类任务中的优越性能,还展示了其在 DNase-seq 信号回归任务中的优越性能。此外,我们进一步可视化了卷积核,并展示了识别出的序列特征与已知基序的匹配。我们最后通过在乳腺癌数据集的分析中展示了我们的模型在发现致病非编码变异方面的敏感性,证明了我们模型的有效性。我们期望看到 Deopen 在注释人类基因组和识别与疾病相关的非编码变异方面,无论是在公共还是内部染色质可及性数据上都得到广泛应用。

可用性和实现

Deopen 可在 https://github.com/kimmo1019/Deopen 上免费获取。

联系方式

ruijiang@tsinghua.edu.cn

补充信息

补充数据可在 Bioinformatics 在线获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/682e/6192215/c9514afd08e1/btx679f1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验