利用基因组拓扑特征和深度网络预测CpG二核苷酸的DNA甲基化状态

Predicting DNA Methylation State of CpG Dinucleotide Using Genome Topological Features and Deep Networks.

作者信息

Wang Yiheng, Liu Tong, Xu Dong, Shi Huidong, Zhang Chaoyang, Mo Yin-Yuan, Wang Zheng

机构信息

School of Computing, University of Southern Mississippi, 118 College Drive #5106, Hattiesburg, MS 39406, USA.

Department of Computer Science and Christopher S. Bond Life Sciences Center, University of Missouri, 201 Engineering Building West, Columbia, MO 65211, USA.

出版信息

Sci Rep. 2016 Jan 22;6:19598. doi: 10.1038/srep19598.

DOI:10.1038/srep19598

PMID:26797014

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4726425/

Abstract

The hypo- or hyper-methylation of the human genome is one of the epigenetic features of leukemia. However, experimental approaches have only determined the methylation state of a small portion of the human genome. We developed deep learning based (stacked denoising autoencoders, or SdAs) software named "DeepMethyl" to predict the methylation state of DNA CpG dinucleotides using features inferred from three-dimensional genome topology (based on Hi-C) and DNA sequence patterns. We used the experimental data from immortalised myelogenous leukemia (K562) and healthy lymphoblastoid (GM12878) cell lines to train the learning models and assess prediction performance. We have tested various SdA architectures with different configurations of hidden layer(s) and amount of pre-training data and compared the performance of deep networks relative to support vector machines (SVMs). Using the methylation states of sequentially neighboring regions as one of the learning features, an SdA achieved a blind test accuracy of 89.7% for GM12878 and 88.6% for K562. When the methylation states of sequentially neighboring regions are unknown, the accuracies are 84.82% for GM12878 and 72.01% for K562. We also analyzed the contribution of genome topological features inferred from Hi-C. DeepMethyl can be accessed at http://dna.cs.usm.edu/deepmethyl/.

摘要

人类基因组的低甲基化或高甲基化是白血病的表观遗传特征之一。然而，实验方法仅确定了人类基因组一小部分的甲基化状态。我们开发了基于深度学习的（堆叠去噪自动编码器，即SdA）软件“DeepMethyl”，以利用从三维基因组拓扑结构（基于Hi-C）和DNA序列模式推断出的特征来预测DNA CpG二核苷酸的甲基化状态。我们使用来自永生化髓性白血病（K562）和健康淋巴母细胞（GM12878）细胞系的实验数据来训练学习模型并评估预测性能。我们测试了具有不同隐藏层配置和预训练数据量的各种SdA架构，并比较了深度网络相对于支持向量机（SVM）的性能。将连续相邻区域的甲基化状态用作学习特征之一时，一个SdA对GM12878的盲测准确率为89.7%，对K562的盲测准确率为88.6%。当连续相邻区域的甲基化状态未知时，GM12878的准确率为84.82%，K562的准确率为72.01%。我们还分析了从Hi-C推断出的基因组拓扑特征的贡献。可通过http://dna.cs.usm.edu/deepmethyl/访问DeepMethyl。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0fa4/4726425/21d8ff717a82/srep19598-f1.jpg

相似文献

Predicting DNA Methylation State of CpG Dinucleotide Using Genome Topological Features and Deep Networks.利用基因组拓扑特征和深度网络预测CpG二核苷酸的DNA甲基化状态

Sci Rep. 2016 Jan 22;6:19598. doi: 10.1038/srep19598.

CpGIMethPred: computational model for predicting methylation status of CpG islands in human genome.CpGIMethPred：用于预测人类基因组中 CpG 岛甲基化状态的计算模型。

BMC Med Genomics. 2013;6 Suppl 1(Suppl 1):S13. doi: 10.1186/1755-8794-6-S1-S13. Epub 2013 Jan 23.

Prediction of methylated CpGs in DNA sequences using a support vector machine.使用支持向量机预测DNA序列中的甲基化CpG位点

FEBS Lett. 2005 Aug 15;579(20):4302-8. doi: 10.1016/j.febslet.2005.07.002.

Guanine quadruplex DNA structure restricts methylation of CpG dinucleotides genome-wide.鸟嘌呤四链体DNA结构在全基因组范围内限制了CpG二核苷酸的甲基化。

Mol Biosyst. 2010 Dec;6(12):2439-47. doi: 10.1039/c0mb00009d. Epub 2010 Sep 29.

Predicting methylation status of CpG islands in the human brain.预测人类大脑中CpG岛的甲基化状态。

Bioinformatics. 2006 Sep 15;22(18):2204-9. doi: 10.1093/bioinformatics/btl377. Epub 2006 Jul 12.

Distribution of DNA methylation, CpGs, and CpG islands in human isochores.人类同型区中的 DNA 甲基化、CpG 及其岛的分布。

Genomics. 2010 Jan;95(1):25-8. doi: 10.1016/j.ygeno.2009.09.006. Epub 2009 Oct 1.

DBCAT: database of CpG islands and analytical tools for identifying comprehensive methylation profiles in cancer cells.DBCAT：CpG岛数据库及用于识别癌细胞中全面甲基化图谱的分析工具。

J Comput Biol. 2011 Aug;18(8):1013-7. doi: 10.1089/cmb.2010.0038. Epub 2011 Jan 8.

Computational prediction of methylation status in human genomic sequences.人类基因组序列中甲基化状态的计算预测。

Proc Natl Acad Sci U S A. 2006 Jul 11;103(28):10713-6. doi: 10.1073/pnas.0602949103. Epub 2006 Jul 3.

Genome-Wide DNA Methylation Analysis Identifies Novel Hypomethylated Non-Pericentromeric Genes with Potential Clinical Implications in ICF Syndrome.全基因组DNA甲基化分析鉴定出在ICF综合征中具有潜在临床意义的新型低甲基化非着丝粒基因。

PLoS One. 2015 Jul 10;10(7):e0132517. doi: 10.1371/journal.pone.0132517. eCollection 2015.

MRCNN: a deep learning model for regression of genome-wide DNA methylation.MRCNN：一种用于全基因组 DNA 甲基化回归的深度学习模型。

BMC Genomics. 2019 Apr 4;20(Suppl 2):192. doi: 10.1186/s12864-019-5488-5.

引用本文的文献

EDNTOM: An Ensemble Learning and Weight Mechanism-Based Nanopore Methylation Detection Tool.EDNTOM：一种基于集成学习和权重机制的纳米孔甲基化检测工具。

ACS Omega. 2025 Jul 23;10(30):33031-33044. doi: 10.1021/acsomega.5c01924. eCollection 2025 Aug 5.

Integrating Artificial Intelligence in Next-Generation Sequencing: Advances, Challenges, and Future Directions.将人工智能整合到下一代测序中：进展、挑战与未来方向。

Curr Issues Mol Biol. 2025 Jun 19;47(6):470. doi: 10.3390/cimb47060470.

GPX4 Promoter Hypermethylation Induced by Ischemia/Reperfusion Injury Regulates Hepatocytic Ferroptosis.缺血/再灌注损伤诱导的GPX4启动子高甲基化调控肝细胞铁死亡

J Clin Transl Hepatol. 2024 Nov 28;12(11):917-929. doi: 10.14218/JCTH.2024.00135. Epub 2024 Oct 18.

Advancing epigenetic profiling in cervical cancer: machine learning techniques for classifying DNA methylation patterns.宫颈癌中表观遗传谱分析的进展：用于分类DNA甲基化模式的机器学习技术

3 Biotech. 2024 Nov;14(11):264. doi: 10.1007/s13205-024-04107-2. Epub 2024 Oct 9.

Learning Micro-C from Hi-C with diffusion models.基于扩散模型从 Hi-C 中学习微观结构。

PLoS Comput Biol. 2024 May 17;20(5):e1012136. doi: 10.1371/journal.pcbi.1012136. eCollection 2024 May.

From tradition to innovation: conventional and deep learning frameworks in genome annotation.从传统到创新：基因组注释中的常规和深度学习框架。

Brief Bioinform. 2024 Mar 27;25(3). doi: 10.1093/bib/bbae138.

Seq-RBPPred: Predicting RNA-Binding Proteins from Sequence.Seq-RBPPred：从序列预测RNA结合蛋白。

ACS Omega. 2024 Mar 4;9(11):12734-12742. doi: 10.1021/acsomega.3c08381. eCollection 2024 Mar 19.

i5mC-DCGA: an improved hybrid network framework based on the CBAM attention mechanism for identifying promoter 5mC sites.i5mC-DCGA：一种基于 CBAM 注意力机制的改进型混合网络框架，用于识别启动子 5mC 位点。

BMC Genomics. 2024 Mar 5;25(1):242. doi: 10.1186/s12864-024-10154-z.

Deep Learning for Genomics: From Early Neural Nets to Modern Large Language Models.深度学习在基因组学中的应用：从早期神经网络到现代大型语言模型。

Int J Mol Sci. 2023 Nov 1;24(21):15858. doi: 10.3390/ijms242115858.

HiC4D: forecasting spatiotemporal Hi-C data with residual ConvLSTM.HiC4D：基于残差 ConvLSTM 预测时空 Hi-C 数据

Brief Bioinform. 2023 Sep 20;24(5). doi: 10.1093/bib/bbad263.

本文引用的文献

iRNA-Methyl: Identifying N(6)-methyladenosine sites using pseudo nucleotide composition.iRNA-Methyl：利用伪核苷酸组成识别N(6)-甲基腺苷位点。

Anal Biochem. 2015 Dec 1;490:26-33. doi: 10.1016/j.ab.2015.08.021. Epub 2015 Aug 24.

Pseudo nucleotide composition or PseKNC: an effective formulation for analyzing genomic sequences.伪核苷酸组成或PseKNC：一种用于分析基因组序列的有效方法。

Mol Biosyst. 2015 Oct;11(10):2620-34. doi: 10.1039/c5mb00155b.

Pse-in-One: a web server for generating various modes of pseudo components of DNA, RNA, and protein sequences.Pse-in-One：一个用于生成DNA、RNA和蛋白质序列各种伪组件模式的网络服务器。

Nucleic Acids Res. 2015 Jul 1;43(W1):W65-71. doi: 10.1093/nar/gkv458. Epub 2015 May 9.

iPPI-Esml: An ensemble classifier for identifying the interactions of proteins by incorporating their physicochemical properties and wavelet transforms into PseAAC.iPPI-Esml：一种通过将蛋白质的物理化学性质和小波变换纳入伪氨基酸组成来识别蛋白质相互作用的集成分类器。

J Theor Biol. 2015 Jul 21;377:47-56. doi: 10.1016/j.jtbi.2015.04.011. Epub 2015 Apr 20.

Predicting genome-wide DNA methylation using methylation marks, genomic position, and DNA regulatory elements.利用甲基化标记、基因组位置和DNA调控元件预测全基因组DNA甲基化

Genome Biol. 2015 Jan 24;16(1):14. doi: 10.1186/s13059-015-0581-9.

iDNA-Methyl: identifying DNA methylation sites via pseudo trinucleotide composition.iDNA-Methyl：通过伪三核苷酸组成识别DNA甲基化位点。

Anal Biochem. 2015 Apr 1;474:69-77. doi: 10.1016/j.ab.2014.12.009. Epub 2015 Jan 14.

Impacts of bioinformatics to medicinal chemistry.生物信息学对药物化学的影响。

Med Chem. 2015;11(3):218-34. doi: 10.2174/1573406411666141229162834.

iPro54-PseKNC: a sequence-based predictor for identifying sigma-54 promoters in prokaryote with pseudo k-tuple nucleotide composition.iPro54-PseKNC：一种基于序列的预测工具，用于通过伪k元核苷酸组成识别原核生物中的σ-54启动子。

Nucleic Acids Res. 2014 Dec 1;42(21):12961-72. doi: 10.1093/nar/gku1019. Epub 2014 Oct 31.

Identification of DNA-binding proteins by incorporating evolutionary information into pseudo amino acid composition via the top-n-gram approach.通过 top-n-gram 方法将进化信息纳入伪氨基酸组成，从而鉴定 DNA 结合蛋白。

J Biomol Struct Dyn. 2015;33(8):1720-30. doi: 10.1080/07391102.2014.968624. Epub 2014 Oct 28.

iUbiq-Lys: prediction of lysine ubiquitination sites in proteins by extracting sequence evolution information via a gray system model.iUbiq-Lys：通过灰色系统模型提取序列进化信息来预测蛋白质中的赖氨酸泛素化位点。

J Biomol Struct Dyn. 2015;33(8):1731-42. doi: 10.1080/07391102.2014.968875. Epub 2014 Nov 6.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

利用基因组拓扑特征和深度网络预测CpG二核苷酸的DNA甲基化状态

Predicting DNA Methylation State of CpG Dinucleotide Using Genome Topological Features and Deep Networks.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献