School of Life Sciences, University of Nevada, Las Vegas, NV 89154, United States.
Nevada Institute of Personalized Medicine, University of Nevada, Las Vegas, NV 89154, United States.
Bioinformatics. 2024 Jun 3;40(6). doi: 10.1093/bioinformatics/btae367.
Understanding the rules that govern enhancer-driven transcription remains a central unsolved problem in genomics. Now with multiple massively parallel enhancer perturbation assays published, there are enough data that we can utilize to learn to predict enhancer-promoter (EP) relationships in a data-driven manner.
We applied machine learning to one of the largest enhancer perturbation studies integrated with transcription factor (TF) and histone modification ChIP-seq. The results uncovered a discrepancy in the prediction of genome-wide data compared to data from targeted experiments. Relative strength of contact was important for prediction, confirming the basic principle of EP regulation. Novel features such as the density of the enhancers/promoters in the genomic region was found to be important, highlighting our lack of understanding on how other elements in the region contribute to the regulation. Several TF peaks were identified that improved the prediction by identifying the negatives and reducing False Positives. In summary, integrating genomic assays with enhancer perturbation studies increased the accuracy of the model, and provided novel insights into the understanding of enhancer-driven transcription.
The trained models, data, and the source code are available at http://doi.org/10.5281/zenodo.11290386 and https://github.com/HanLabUNLV/sleps.
理解调控增强子驱动转录的规则仍然是基因组学中一个未解决的核心问题。现在已经有多个大规模平行的增强子扰动分析实验发表,我们有足够的数据可以利用,以便以数据驱动的方式学习预测增强子-启动子(EP)关系。
我们将机器学习应用于最大的增强子扰动研究之一,该研究与转录因子(TF)和组蛋白修饰 ChIP-seq 相结合。结果揭示了与靶向实验数据相比,全基因组数据预测中的差异。接触的相对强度对预测很重要,这证实了 EP 调控的基本原理。发现了新的特征,如基因组区域中增强子/启动子的密度很重要,这突显了我们对该区域中其他元素如何有助于调控的理解不足。鉴定出了几个 TF 峰,通过识别阴性和减少假阳性来提高预测的准确性。总之,将基因组分析与增强子扰动研究相结合,提高了模型的准确性,并为理解增强子驱动转录提供了新的见解。
训练模型、数据和源代码可在 http://doi.org/10.5281/zenodo.11290386 和 https://github.com/HanLabUNLV/sleps 上获得。