Suppr超能文献

使用染色质深度学习模型从组蛋白标记预测基因表达取决于组蛋白标记功能、调控距离和细胞状态。

Predicting gene expression from histone marks using chromatin deep learning models depends on histone mark function, regulatory distance and cellular states.

作者信息

Murphy Alan E, Askarova Aydan, Lenhard Boris, Skene Nathan G, Marzi Sarah J

机构信息

UK Dementia Research Institute at Imperial College London, 86 Wood Lane, London W12 0BZ, UK.

Department of Brain Sciences, Imperial College London, 86 Wood Lane, London W12 0BZ, UK.

出版信息

Nucleic Acids Res. 2025 Feb 8;53(4). doi: 10.1093/nar/gkae1212.

Abstract

To understand the complex relationship between histone mark activity and gene expression, recent advances have used in silico predictions based on large-scale machine learning models. However, these approaches have omitted key contributing factors like cell state, histone mark function or distal effects, which impact the relationship, limiting their findings. Moreover, downstream use of these models for new biological insight is lacking. Here, we present the most comprehensive study of this relationship to date - investigating seven histone marks in eleven cell types across a diverse range of cell states. We used convolutional and attention-based models to predict transcription from histone mark activity at promoters and distal regulatory elements. Our work shows that histone mark function, genomic distance and cellular states collectively influence a histone mark's relationship with transcription. We found that no individual histone mark is consistently the strongest predictor of gene expression across all genomic and cellular contexts. This highlights the need to consider all three factors when determining the effect of histone mark activity on transcriptional state. Furthermore, we conducted in silico histone mark perturbation assays, uncovering functional and disease related loci and highlighting frameworks for the use of chromatin deep learning models to uncover new biological insight.

摘要

为了理解组蛋白标记活性与基因表达之间的复杂关系,最近的进展采用了基于大规模机器学习模型的计算机模拟预测。然而,这些方法忽略了细胞状态、组蛋白标记功能或远端效应等关键影响因素,而这些因素会影响二者关系,从而限制了研究结果。此外,这些模型在挖掘新的生物学见解方面缺乏下游应用。在此,我们展示了迄今为止关于这种关系最全面的研究——在多种不同细胞状态下,对11种细胞类型中的7种组蛋白标记进行研究。我们使用基于卷积和注意力的模型,根据启动子和远端调控元件处的组蛋白标记活性来预测转录。我们的研究表明,组蛋白标记功能、基因组距离和细胞状态共同影响组蛋白标记与转录之间的关系。我们发现,在所有基因组和细胞环境中,没有单个组蛋白标记始终是基因表达最强有力的预测因子。这凸显了在确定组蛋白标记活性对转录状态的影响时,需要考虑所有这三个因素。此外,我们进行了计算机模拟的组蛋白标记扰动试验,揭示了功能和疾病相关位点,并强调了利用染色质深度学习模型挖掘新生物学见解的框架。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b753/11879020/0eaae1738846/gkae1212figgra1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验