Suppr超能文献

Deep5hmC:通过多模态深度学习模型预测全基因组 5-羟甲基胞嘧啶景观。

Deep5hmC: predicting genome-wide 5-hydroxymethylcytosine landscape via a multimodal deep learning model.

机构信息

Department of Biostatistics, University of Florida, Gainesville, FL 32603, United States.

Department of Human Genetics, Emory University School of Medicine, Atlanta, GA 30322, United States.

出版信息

Bioinformatics. 2024 Sep 2;40(9). doi: 10.1093/bioinformatics/btae528.

Abstract

MOTIVATION

5-Hydroxymethylcytosine (5hmC), a crucial epigenetic mark with a significant role in regulating tissue-specific gene expression, is essential for understanding the dynamic functions of the human genome. Despite its importance, predicting 5hmC modification across the genome remains a challenging task, especially when considering the complex interplay between DNA sequences and various epigenetic factors such as histone modifications and chromatin accessibility.

RESULTS

Using tissue-specific 5hmC sequencing data, we introduce Deep5hmC, a multimodal deep learning framework that integrates both the DNA sequence and epigenetic features such as histone modification and chromatin accessibility to predict genome-wide 5hmC modification. The multimodal design of Deep5hmC demonstrates remarkable improvement in predicting both qualitative and quantitative 5hmC modification compared to unimodal versions of Deep5hmC and state-of-the-art machine learning methods. This improvement is demonstrated through benchmarking on a comprehensive set of 5hmC sequencing data collected at four developmental stages during forebrain organoid development and across 17 human tissues. Compared to DeepSEA and random forest, Deep5hmC achieves close to 4% and 17% improvement of Area Under the Receiver Operating Characteristic (AUROC) across four forebrain developmental stages, and 6% and 27% across 17 human tissues for predicting binary 5hmC modification sites; and 8% and 22% improvement of Spearman correlation coefficient across four forebrain developmental stages, and 17% and 30% across 17 human tissues for predicting continuous 5hmC modification. Notably, Deep5hmC showcases its practical utility by accurately predicting gene expression and identifying differentially hydroxymethylated regions (DhMRs) in a case-control study of Alzheimer's disease (AD). Deep5hmC significantly improves our understanding of tissue-specific gene regulation and facilitates the development of new biomarkers for complex diseases.

AVAILABILITY AND IMPLEMENTATION

Deep5hmC is available via https://github.com/lichen-lab/Deep5hmC.

摘要

动机

5-羟甲基胞嘧啶(5hmC)是一种重要的表观遗传标记,在调节组织特异性基因表达方面具有重要作用,对于理解人类基因组的动态功能至关重要。尽管其重要性不言而喻,但预测整个基因组中的 5hmC 修饰仍然是一项具有挑战性的任务,尤其是在考虑 DNA 序列与各种表观遗传因素(如组蛋白修饰和染色质可及性)之间的复杂相互作用时。

结果

我们使用组织特异性 5hmC 测序数据,引入了 Deep5hmC,这是一种多模态深度学习框架,它整合了 DNA 序列和表观遗传特征,如组蛋白修饰和染色质可及性,以预测全基因组 5hmC 修饰。Deep5hmC 的多模态设计在预测定性和定量 5hmC 修饰方面与 Deep5hmC 的单模态版本和最先进的机器学习方法相比,均有显著提高。通过在大脑器官发生的四个发育阶段以及 17 个人类组织中收集的综合 5hmC 测序数据集上进行基准测试,证明了这一改进。与 DeepSEA 和随机森林相比,Deep5hmC 在预测四个大脑发育阶段的二元 5hmC 修饰位点时,AUROC 提高了近 4%和 17%;在预测 17 个人类组织时,AUROC 提高了 6%和 27%;在预测四个大脑发育阶段的连续 5hmC 修饰时,Spearman 相关系数提高了 8%和 22%;在预测 17 个人类组织时,Spearman 相关系数提高了 17%和 30%。值得注意的是,Deep5hmC 在阿尔茨海默病(AD)病例对照研究中准确预测基因表达和识别差异羟甲基化区域(DhMRs),展示了其实用性。Deep5hmC 显著提高了我们对组织特异性基因调控的理解,并为复杂疾病的新生物标志物的开发提供了帮助。

可用性和实现

Deep5hmC 可通过 https://github.com/lichen-lab/Deep5hmC 获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f610/11379467/66cb94cafc30/btae528f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验