基于全基因组扰动研究的增强子-启动子特异性的监督学习突出了改进学习的领域。

Supervised learning of enhancer-promoter specificity based on genome-wide perturbation studies highlights areas for improvement in learning.

机构信息

School of Life Sciences, University of Nevada, Las Vegas, NV 89154, United States.

Nevada Institute of Personalized Medicine, University of Nevada, Las Vegas, NV 89154, United States.

出版信息

Bioinformatics. 2024 Jun 3;40(6). doi: 10.1093/bioinformatics/btae367.

DOI:10.1093/bioinformatics/btae367

PMID:38870532

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11211214/

Abstract

MOTIVATION

Understanding the rules that govern enhancer-driven transcription remains a central unsolved problem in genomics. Now with multiple massively parallel enhancer perturbation assays published, there are enough data that we can utilize to learn to predict enhancer-promoter (EP) relationships in a data-driven manner.

RESULTS

We applied machine learning to one of the largest enhancer perturbation studies integrated with transcription factor (TF) and histone modification ChIP-seq. The results uncovered a discrepancy in the prediction of genome-wide data compared to data from targeted experiments. Relative strength of contact was important for prediction, confirming the basic principle of EP regulation. Novel features such as the density of the enhancers/promoters in the genomic region was found to be important, highlighting our lack of understanding on how other elements in the region contribute to the regulation. Several TF peaks were identified that improved the prediction by identifying the negatives and reducing False Positives. In summary, integrating genomic assays with enhancer perturbation studies increased the accuracy of the model, and provided novel insights into the understanding of enhancer-driven transcription.

AVAILABILITY AND IMPLEMENTATION

The trained models, data, and the source code are available at http://doi.org/10.5281/zenodo.11290386 and https://github.com/HanLabUNLV/sleps.

摘要

动机

理解调控增强子驱动转录的规则仍然是基因组学中一个未解决的核心问题。现在已经有多个大规模平行的增强子扰动分析实验发表，我们有足够的数据可以利用，以便以数据驱动的方式学习预测增强子-启动子（EP）关系。

结果

我们将机器学习应用于最大的增强子扰动研究之一，该研究与转录因子（TF）和组蛋白修饰 ChIP-seq 相结合。结果揭示了与靶向实验数据相比，全基因组数据预测中的差异。接触的相对强度对预测很重要，这证实了 EP 调控的基本原理。发现了新的特征，如基因组区域中增强子/启动子的密度很重要，这突显了我们对该区域中其他元素如何有助于调控的理解不足。鉴定出了几个 TF 峰，通过识别阴性和减少假阳性来提高预测的准确性。总之，将基因组分析与增强子扰动研究相结合，提高了模型的准确性，并为理解增强子驱动转录提供了新的见解。

可用性和实现

训练模型、数据和源代码可在 http://doi.org/10.5281/zenodo.11290386 和 https://github.com/HanLabUNLV/sleps 上获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/763c/11211214/5e71621cec72/btae367f1.jpg

相似文献

Supervised learning of enhancer-promoter specificity based on genome-wide perturbation studies highlights areas for improvement in learning.基于全基因组扰动研究的增强子-启动子特异性的监督学习突出了改进学习的领域。

Bioinformatics. 2024 Jun 3;40(6). doi: 10.1093/bioinformatics/btae367.

Enhancer prediction in the human genome by probabilistic modelling of the chromatin feature patterns.通过对染色质特征模式的概率建模来预测人类基因组中的增强子。

BMC Bioinformatics. 2020 Jul 20;21(1):317. doi: 10.1186/s12859-020-03621-3.

Enhancer identification in mouse embryonic stem cells using integrative modeling of chromatin and genomic features.利用染色质和基因组特征的综合建模识别小鼠胚胎干细胞中的增强子。

BMC Genomics. 2012 Apr 26;13:152. doi: 10.1186/1471-2164-13-152.

Predicting enhancers in mammalian genomes using supervised hidden Markov models.利用监督隐马尔可夫模型预测哺乳动物基因组中的增强子。

BMC Bioinformatics. 2019 Mar 27;20(1):157. doi: 10.1186/s12859-019-2708-6.

Integration of 198 ChIP-seq datasets reveals human cis-regulatory regions.整合198个染色质免疫沉淀测序数据集揭示人类顺式调控区域。

J Comput Biol. 2012 Sep;19(9):989-97. doi: 10.1089/cmb.2012.0100. Epub 2012 Aug 16.

Opening up the blackbox: an interpretable deep neural network-based classifier for cell-type specific enhancer predictions.打开黑箱：一种基于可解释深度神经网络的细胞类型特异性增强子预测分类器。

BMC Syst Biol. 2016 Aug 1;10 Suppl 2(Suppl 2):54. doi: 10.1186/s12918-016-0302-3.

spatzie: an R package for identifying significant transcription factor motif co-enrichment from enhancer-promoter interactions.spatzie：一个用于从增强子-启动子相互作用中识别显著转录因子基序共富集的 R 包。

Nucleic Acids Res. 2022 May 20;50(9):e52. doi: 10.1093/nar/gkac036.

Enhancer-MDLF: a novel deep learning framework for identifying cell-specific enhancers.增强型 MDLF：一种用于识别细胞特异性增强子的新型深度学习框架。

Brief Bioinform. 2024 Jan 22;25(2). doi: 10.1093/bib/bbae083.

ChIP-GSM: Inferring active transcription factor modules to predict functional regulatory elements.ChIP-GSM：推断活性转录因子模块以预测功能调控元件。

PLoS Comput Biol. 2021 Jul 22;17(7):e1009203. doi: 10.1371/journal.pcbi.1009203. eCollection 2021 Jul.

Exploiting sequence-based features for predicting enhancer-promoter interactions.利用基于序列的特征预测增强子-启动子相互作用。

Bioinformatics. 2017 Jul 15;33(14):i252-i260. doi: 10.1093/bioinformatics/btx257.

本文引用的文献

The adapted Activity-By-Contact model for enhancer-gene assignment and its application to single-cell data.适应性活动接触模型在增强子-基因分配中的应用及其在单细胞数据中的应用。

Bioinformatics. 2023 Feb 3;39(2). doi: 10.1093/bioinformatics/btad062.

CHAMP1 binds to REV7/FANCV and promotes homologous recombination repair.CHAMP1 与 REV7/FANCV 结合，促进同源重组修复。

Cell Rep. 2022 Aug 30;40(9):111297. doi: 10.1016/j.celrep.2022.111297.

Compatibility rules of human enhancer and promoter sequences.人类增强子和启动子序列的兼容性规则。

Nature. 2022 Jul;607(7917):176-184. doi: 10.1038/s41586-022-04877-w. Epub 2022 May 20.

Systematic analysis of intrinsic enhancer-promoter compatibility in the mouse genome.系统分析小鼠基因组中内在增强子-启动子的兼容性。

Mol Cell. 2022 Jul 7;82(13):2519-2531.e6. doi: 10.1016/j.molcel.2022.04.009. Epub 2022 Apr 29.

Navigating the pitfalls of applying machine learning in genomics.在基因组学中应用机器学习的陷阱。

Nat Rev Genet. 2022 Mar;23(3):169-181. doi: 10.1038/s41576-021-00434-9. Epub 2021 Nov 26.

JAK-STAT in Early Hematopoiesis and Leukemia.早期造血与白血病中的JAK-STAT信号通路

Front Cell Dev Biol. 2021 May 14;9:669363. doi: 10.3389/fcell.2021.669363. eCollection 2021.

AP-1 subunits converge promiscuously at enhancers to potentiate transcription.AP-1 亚基杂乱无章地聚集在增强子上，增强转录。

Genome Res. 2021 Apr;31(4):538-550. doi: 10.1101/gr.267898.120. Epub 2021 Mar 5.

Targeted Perturb-seq enables genome-scale genetic screens in single cells.靶向扰动测序可在单细胞中进行全基因组规模的遗传筛选。

Nat Methods. 2020 Jun;17(6):629-635. doi: 10.1038/s41592-020-0837-5. Epub 2020 Jun 1.

Identifying statistically significant chromatin contacts from Hi-C data with FitHiC2.利用 FitHiC2 从 Hi-C 数据中识别具有统计学意义的染色质接触。

Nat Protoc. 2020 Mar;15(3):991-1012. doi: 10.1038/s41596-019-0273-0. Epub 2020 Jan 24.

Activity-by-contact model of enhancer-promoter regulation from thousands of CRISPR perturbations.基于数千个 CRISPR 干扰的增强子-启动子调控的活性-接触模型。

Nat Genet. 2019 Dec;51(12):1664-1669. doi: 10.1038/s41588-019-0538-0. Epub 2019 Nov 29.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

基于全基因组扰动研究的增强子-启动子特异性的监督学习突出了改进学习的领域。

Supervised learning of enhancer-promoter specificity based on genome-wide perturbation studies highlights areas for improvement in learning.

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY AND IMPLEMENTATION

动机

结果

可用性和实现

相似文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

本文引用的文献