Ma Xin, Thela Sai Ritesh, Zhao Fengdi, Yao Bing, Wen Zhexing, Jin Peng, Zhao Jinying, Chen Li
Department of Biostatistics, University of Florida, Gainesville, FL, 32603, USA.
Department of Human Genetics, Emory University School of Medicine, Atlanta, GA, 30322, USA.
bioRxiv. 2024 Mar 6:2024.03.04.583444. doi: 10.1101/2024.03.04.583444.
5-hydroxymethylcytosine (5hmC), a critical epigenetic mark with a significant role in regulating tissue-specific gene expression, is essential for understanding the dynamic functions of the human genome. Using tissue-specific 5hmC sequencing data, we introduce Deep5hmC, a multimodal deep learning framework that integrates both the DNA sequence and the histone modification information to predict genome-wide 5hmC modification. The multimodal design of Deep5hmC demonstrates remarkable improvement in predicting both qualitative and quantitative 5hmC modification compared to unimodal versions of Deep5hmC and state-of-the-art machine learning methods. This improvement is demonstrated through benchmarking on a comprehensive set of 5hmC sequencing data collected at four time points during forebrain organoid development and across 17 human tissues. Notably, Deep5hmC showcases its practical utility by accurately predicting gene expression and identifying differentially hydroxymethylated regions in a case-control study of Alzheimer's disease.
5-羟甲基胞嘧啶(5hmC)是一种关键的表观遗传标记,在调节组织特异性基因表达中起重要作用,对于理解人类基因组的动态功能至关重要。利用组织特异性5hmC测序数据,我们引入了Deep5hmC,这是一个多模态深度学习框架,它整合了DNA序列和组蛋白修饰信息来预测全基因组的5hmC修饰。与Deep5hmC的单模态版本和当前最先进的机器学习方法相比,Deep5hmC的多模态设计在预测定性和定量5hmC修饰方面都有显著改进。这种改进通过对在前脑类器官发育的四个时间点以及17种人类组织中收集的一组全面的5hmC测序数据进行基准测试得到了证明。值得注意的是,在一项阿尔茨海默病的病例对照研究中,Deep5hmC通过准确预测基因表达和识别差异羟甲基化区域展示了其实际应用价值。