Suppr超能文献

利用基因组拓扑特征和深度网络预测CpG二核苷酸的DNA甲基化状态

Predicting DNA Methylation State of CpG Dinucleotide Using Genome Topological Features and Deep Networks.

作者信息

Wang Yiheng, Liu Tong, Xu Dong, Shi Huidong, Zhang Chaoyang, Mo Yin-Yuan, Wang Zheng

机构信息

School of Computing, University of Southern Mississippi, 118 College Drive #5106, Hattiesburg, MS 39406, USA.

Department of Computer Science and Christopher S. Bond Life Sciences Center, University of Missouri, 201 Engineering Building West, Columbia, MO 65211, USA.

出版信息

Sci Rep. 2016 Jan 22;6:19598. doi: 10.1038/srep19598.

Abstract

The hypo- or hyper-methylation of the human genome is one of the epigenetic features of leukemia. However, experimental approaches have only determined the methylation state of a small portion of the human genome. We developed deep learning based (stacked denoising autoencoders, or SdAs) software named "DeepMethyl" to predict the methylation state of DNA CpG dinucleotides using features inferred from three-dimensional genome topology (based on Hi-C) and DNA sequence patterns. We used the experimental data from immortalised myelogenous leukemia (K562) and healthy lymphoblastoid (GM12878) cell lines to train the learning models and assess prediction performance. We have tested various SdA architectures with different configurations of hidden layer(s) and amount of pre-training data and compared the performance of deep networks relative to support vector machines (SVMs). Using the methylation states of sequentially neighboring regions as one of the learning features, an SdA achieved a blind test accuracy of 89.7% for GM12878 and 88.6% for K562. When the methylation states of sequentially neighboring regions are unknown, the accuracies are 84.82% for GM12878 and 72.01% for K562. We also analyzed the contribution of genome topological features inferred from Hi-C. DeepMethyl can be accessed at http://dna.cs.usm.edu/deepmethyl/.

摘要

人类基因组的低甲基化或高甲基化是白血病的表观遗传特征之一。然而,实验方法仅确定了人类基因组一小部分的甲基化状态。我们开发了基于深度学习的(堆叠去噪自动编码器,即SdA)软件“DeepMethyl”,以利用从三维基因组拓扑结构(基于Hi-C)和DNA序列模式推断出的特征来预测DNA CpG二核苷酸的甲基化状态。我们使用来自永生化髓性白血病(K562)和健康淋巴母细胞(GM12878)细胞系的实验数据来训练学习模型并评估预测性能。我们测试了具有不同隐藏层配置和预训练数据量的各种SdA架构,并比较了深度网络相对于支持向量机(SVM)的性能。将连续相邻区域的甲基化状态用作学习特征之一时,一个SdA对GM12878的盲测准确率为89.7%,对K562的盲测准确率为88.6%。当连续相邻区域的甲基化状态未知时,GM12878的准确率为84.82%,K562的准确率为72.01%。我们还分析了从Hi-C推断出的基因组拓扑特征的贡献。可通过http://dna.cs.usm.edu/deepmethyl/访问DeepMethyl。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0fa4/4726425/21d8ff717a82/srep19598-f1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验