Suppr超能文献

巴塞特:利用深度卷积神经网络学习可及基因组的调控密码。

Basset: learning the regulatory code of the accessible genome with deep convolutional neural networks.

作者信息

Kelley David R, Snoek Jasper, Rinn John L

机构信息

Department of Stem Cell and Regenerative Biology, Harvard University, Cambridge, Massachusetts 02138, USA;

School of Engineering and Applied Science, Harvard University, Cambridge, Massachusetts 02138, USA.

出版信息

Genome Res. 2016 Jul;26(7):990-9. doi: 10.1101/gr.200535.115. Epub 2016 May 3.

Abstract

The complex language of eukaryotic gene expression remains incompletely understood. Despite the importance suggested by many noncoding variants statistically associated with human disease, nearly all such variants have unknown mechanisms. Here, we address this challenge using an approach based on a recent machine learning advance-deep convolutional neural networks (CNNs). We introduce the open source package Basset to apply CNNs to learn the functional activity of DNA sequences from genomics data. We trained Basset on a compendium of accessible genomic sites mapped in 164 cell types by DNase-seq, and demonstrate greater predictive accuracy than previous methods. Basset predictions for the change in accessibility between variant alleles were far greater for Genome-wide association study (GWAS) SNPs that are likely to be causal relative to nearby SNPs in linkage disequilibrium with them. With Basset, a researcher can perform a single sequencing assay in their cell type of interest and simultaneously learn that cell's chromatin accessibility code and annotate every mutation in the genome with its influence on present accessibility and latent potential for accessibility. Thus, Basset offers a powerful computational approach to annotate and interpret the noncoding genome.

摘要

真核基因表达的复杂语言仍未被完全理解。尽管许多与人类疾病有统计学关联的非编码变异显示出重要性,但几乎所有此类变异的机制都尚不清楚。在此,我们采用一种基于近期机器学习进展——深度卷积神经网络(CNN)的方法来应对这一挑战。我们引入了开源软件包Basset,将CNN应用于从基因组数据中学习DNA序列的功能活性。我们在通过DNase-seq映射的164种细胞类型中的可及基因组位点汇编上训练了Basset,并证明其预测准确性高于先前的方法。对于全基因组关联研究(GWAS)单核苷酸多态性(SNP),Basset对变异等位基因间可及性变化的预测,相对于与它们处于连锁不平衡状态的附近SNP而言,对于可能具有因果关系的SNP要高得多。借助Basset,研究人员可以在其感兴趣的细胞类型中进行一次测序分析,并同时了解该细胞的染色质可及性编码,并用其对当前可及性和潜在可及性的影响来注释基因组中的每一个突变。因此,Basset提供了一种强大的计算方法来注释和解释非编码基因组。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8dc8/4937568/37394a7088f0/990f01.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验