Suppr超能文献

利用深度卷积神经网络直接从基因组序列预测 mRNA 丰度。

Predicting mRNA Abundance Directly from Genomic Sequence Using Deep Convolutional Neural Networks.

机构信息

Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA; Calico Life Sciences LLC, South San Francisco, CA 94080, USA.

Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA; Howard Hughes Medical Institute, Seattle, WA 98195, USA; Brotman Baty Institute for Precision Medicine, University of Washington, Seattle, WA, USA.

出版信息

Cell Rep. 2020 May 19;31(7):107663. doi: 10.1016/j.celrep.2020.107663.

Abstract

Algorithms that accurately predict gene structure from primary sequence alone were transformative for annotating the human genome. Can we also predict the expression levels of genes based solely on genome sequence? Here, we sought to apply deep convolutional neural networks toward that goal. Surprisingly, a model that includes only promoter sequences and features associated with mRNA stability explains 59% and 71% of variation in steady-state mRNA levels in human and mouse, respectively. This model, termed Xpresso, more than doubles the accuracy of alternative sequence-based models and isolates rules as predictive as models relying on chromatic immunoprecipitation sequencing (ChIP-seq) data. Xpresso recapitulates genome-wide patterns of transcriptional activity, and its residuals can be used to quantify the influence of enhancers, heterochromatic domains, and microRNAs. Model interpretation reveals that promoter-proximal CpG dinucleotides strongly predict transcriptional activity. Looking forward, we propose cell-type-specific gene-expression predictions based solely on primary sequences as a grand challenge for the field.

摘要

能够仅根据原始序列准确预测基因结构的算法彻底改变了人类基因组注释方式。我们是否也可以仅基于基因组序列来预测基因的表达水平?在这里,我们试图将深度卷积神经网络应用于该目标。令人惊讶的是,一个仅包含启动子序列和与 mRNA 稳定性相关的特征的模型分别解释了人类和小鼠中稳态 mRNA 水平变化的 59%和 71%。该模型称为 Xpresso,其准确性超过了基于其他序列模型的两倍,并分离出与依赖于染色质免疫沉淀测序 (ChIP-seq) 数据的模型一样具有预测性的规则。Xpresso 再现了转录活性的全基因组模式,其残差可用于量化增强子、异染色质域和 microRNA 的影响。模型解释表明,启动子近端的 CpG 二核苷酸强烈预测转录活性。展望未来,我们提出仅基于原始序列进行细胞类型特异性基因表达预测,作为该领域的一个重大挑战。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验