Genome Biology Program, Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology (BIST), C/ del Dr. Aiguader 88, Barcelona 08003, Spain.
Department of Medicine and Life Sciences, Universitat Pompeu Fabra (UPF), Doctor Aiguader 88, Barcelona 08003, Spain.
Bioinformatics. 2023 Jan 1;39(1). doi: 10.1093/bioinformatics/btac742.
The compartmentalization of biochemical reactions, involved in the activation of gene expression in the eukaryotic nucleus, leads to the formation of membraneless bodies through liquid-liquid phase separation. These formations, called transcriptional condensates, appear to play important roles in gene regulation as they are assembled through the association of multiple enhancer regions in 3D genomic space. To date, we are still lacking efficient computational methodologies to identify the regions responsible for the formation of such condensates, based on genomic and conformational data.
In this work, we present SEGCOND, a computational framework aiming to highlight genomic regions involved in the formation of transcriptional condensates. SEGCOND is flexible in combining multiple genomic datasets related to enhancer activity and chromatin accessibility, to perform a genome segmentation. It then uses this segmentation for the detection of highly transcriptionally active regions of the genome. At a final step, and through the integration of Hi-C data, it identifies regions of putative transcriptional condensates (PTCs) as genomic domains where multiple enhancer elements coalesce in 3D space. SEGCOND identifies a subset of enhancer segments with increased transcriptional activity. PTCs are also found to significantly overlap highly interconnected enhancer elements and super enhancers obtained through two independent approaches. Application of SEGCOND on data from a well-defined system of B-cell to macrophage transdifferentiation leads to the identification of previously unreported genes with a likely role in the process.
Source code and details for the implementation of SEGCOND is available at https://github.com/AntonisK95/SEGCOND.
Supplementary data are available at Bioinformatics online.
参与真核细胞核中基因表达激活的生化反应的分隔导致通过液-液相分离形成无膜体。这些形成物,称为转录凝聚物,似乎在基因调控中发挥重要作用,因为它们通过在 3D 基因组空间中多个增强子区域的关联组装而成。迄今为止,我们仍然缺乏有效的计算方法学来根据基因组和构象数据识别负责形成这种凝聚物的区域。
在这项工作中,我们提出了 SEGCOND,这是一种计算框架,旨在突出参与转录凝聚物形成的基因组区域。SECOND 灵活地结合了与增强子活性和染色质可及性相关的多个基因组数据集,以进行基因组分割。然后,它使用此分割来检测基因组中高度转录活跃的区域。在最后一步,通过整合 Hi-C 数据,它确定了假定转录凝聚物 (PTC) 的区域,作为多个增强子元素在 3D 空间中聚合并的基因组域。SECOND 确定了具有增加转录活性的增强子片段的子集。还发现 PTCs 与通过两种独立方法获得的高度相互连接的增强子元件和超级增强子显著重叠。在 B 细胞向巨噬细胞转分化的明确定义系统上应用 SEGCOND 导致鉴定出以前未报道的可能在该过程中起作用的基因。
SECOND 的源代码和实现细节可在 https://github.com/AntonisK95/SEGCOND 上获得。
补充数据可在生物信息学在线获得。