Institute of Cardiovascular Regeneration, Goethe University Hospital.
Cardio-Pulmonary Institute, Goethe University.
Bioinformatics. 2023 Feb 3;39(2). doi: 10.1093/bioinformatics/btad062.
Identifying regulatory regions in the genome is of great interest for understanding the epigenomic landscape in cells. One fundamental challenge in this context is to find the target genes whose expression is affected by the regulatory regions. A recent successful method is the Activity-By-Contact (ABC) model which scores enhancer-gene interactions based on enhancer activity and the contact frequency of an enhancer to its target gene. However, it describes regulatory interactions entirely from a gene's perspective, and does not account for all the candidate target genes of an enhancer. In addition, the ABC model requires two types of assays to measure enhancer activity, which limits the applicability. Moreover, there is neither implementation available that could allow for an integration with transcription factor (TF) binding information nor an efficient analysis of single-cell data.
We demonstrate that the ABC score can yield a higher accuracy by adapting the enhancer activity according to the number of contacts the enhancer has to its candidate target genes and also by considering all annotated transcription start sites of a gene. Further, we show that the model is comparably accurate with only one assay to measure enhancer activity. We combined our generalized ABC model with TF binding information and illustrated an analysis of a single-cell ATAC-seq dataset of the human heart, where we were able to characterize cell type-specific regulatory interactions and predict gene expression based on TF affinities. All executed processing steps are incorporated into our new computational pipeline STARE.
The software is available at https://github.com/schulzlab/STARE.
marcel.schulz@em.uni-frankfurt.de.
Supplementary data are available at Bioinformatics online.
识别基因组中的调控区域对于理解细胞中的表观基因组景观非常重要。在这方面,一个基本的挑战是找到受调控区域影响的表达基因。最近成功的方法是基于增强子活性和增强子与靶基因接触频率对增强子-基因相互作用进行评分的活性接触(Activity-By-Contact,ABC)模型。然而,它完全从基因的角度描述调控相互作用,而不考虑增强子的所有候选靶基因。此外,ABC 模型需要两种类型的测定来测量增强子活性,这限制了其适用性。此外,既没有可用的实现来允许与转录因子(TF)结合信息的集成,也没有对单细胞数据的有效分析。
我们证明,通过根据增强子与其候选靶基因的接触数量来调整增强子活性,以及考虑基因的所有注释转录起始位点,ABC 评分可以获得更高的准确性。此外,我们还表明,该模型仅使用一种测定来测量增强子活性也具有相当的准确性。我们将我们的广义 ABC 模型与 TF 结合信息相结合,并展示了对人类心脏单细胞 ATAC-seq 数据集的分析,我们能够对细胞类型特异性的调控相互作用进行特征化,并根据 TF 亲和力预测基因表达。所有执行的处理步骤都被纳入我们新的计算管道 STARE 中。
软件可在 https://github.com/schulzlab/STARE 上获得。
marcel.schulz@em.uni-frankfurt.de。
补充数据可在生物信息学在线获得。