Zeng Haoyang, Gifford David K
Computer Science and Artificial Intelligence Lab, Massachusetts Institute of Technology Cambridge, MA 02142, USA.
Nucleic Acids Res. 2017 Jun 20;45(11):e99. doi: 10.1093/nar/gkx177.
DNA methylation plays a crucial role in the establishment of tissue-specific gene expression and the regulation of key biological processes. However, our present inability to predict the effect of genome sequence variation on DNA methylation precludes a comprehensive assessment of the consequences of non-coding variation. We introduce CpGenie, a sequence-based framework that learns a regulatory code of DNA methylation using a deep convolutional neural network and uses this network to predict the impact of sequence variation on proximal CpG site DNA methylation. CpGenie produces allele-specific DNA methylation prediction with single-nucleotide sensitivity that enables accurate prediction of methylation quantitative trait loci (meQTL). We demonstrate that CpGenie prioritizes validated GWAS SNPs, and contributes to the prediction of functional non-coding variants, including expression quantitative trait loci (eQTL) and disease-associated mutations. CpGenie is publicly available to assist in identifying and interpreting regulatory non-coding variants.
DNA甲基化在组织特异性基因表达的建立以及关键生物学过程的调控中起着至关重要的作用。然而,我们目前无法预测基因组序列变异对DNA甲基化的影响,这使得无法全面评估非编码变异的后果。我们引入了CpGenie,这是一个基于序列的框架,它使用深度卷积神经网络学习DNA甲基化的调控代码,并使用该网络预测序列变异对近端CpG位点DNA甲基化的影响。CpGenie以单核苷酸敏感性产生等位基因特异性DNA甲基化预测,从而能够准确预测甲基化数量性状位点(meQTL)。我们证明CpGenie对经过验证的全基因组关联研究(GWAS)单核苷酸多态性(SNP)进行了优先级排序,并有助于预测功能性非编码变异,包括表达数量性状位点(eQTL)和疾病相关突变。CpGenie已公开提供,以协助识别和解释调控性非编码变异。