Bashkatov Artem, Andreasyan Andrey, Konovalov Dmitry, Herbert Alan, Poptsova Maria
International Laboratory of Bioinformatics, HSE University, Moscow, Russia.
InsideOutBio, Charlestown, MA, USA.
Sci Rep. 2025 Jul 2;15(1):23119. doi: 10.1038/s41598-025-07579-1.
G-quadruplexes (GQs) are non-canonical DNA structures encoded by G-flipons with potential roles in gene regulation and chromatin structure. Here, we explore the role of G-flipons in tissue specification. We present a deep learning-based framework for the genome-wide G-flipon predictions across 14 human tissue types. The model was trained using high-confidence experimental maps of GQ-forming sequences and ATAC-seq peaks, conjoined with the location of RNA polymerase, histone marks, and transcription factor binding sites. The training dataset for the DeepGQ model was derived from EndoQuad level 4-6 GQs. Model predictions were subsequently validated against the comprehensive EndoQuad dataset (levels 1-6) to optimize the whole-genome prediction threshold. To identify tissue-specific regulatory patterns, we classified GQ promoter predictions as either 'core' or 'tissue-specific'. We identified a notable overlap between predicted unique tissue-specific GQ sites and master regulatory genes (MRGs), tissue-specific DNase-hypersensitivity sites, and proteins that modulate R-loop formation. Collectively, the findings highlight the transactions between MRG and G-flipons intermediated by RNA: DNA hybrids associated with tissue specification.
G-四链体(GQs)是由G-翻转子编码的非经典DNA结构,在基因调控和染色质结构中具有潜在作用。在此,我们探讨G-翻转子在组织特化中的作用。我们提出了一个基于深度学习的框架,用于在14种人类组织类型中进行全基因组G-翻转子预测。该模型使用GQ形成序列和ATAC-seq峰的高置信度实验图谱进行训练,并结合RNA聚合酶、组蛋白标记和转录因子结合位点的位置。DeepGQ模型的训练数据集来自EndoQuad 4-6级的GQs。随后,根据全面的EndoQuad数据集(1-6级)对模型预测进行验证,以优化全基因组预测阈值。为了识别组织特异性调控模式,我们将GQ启动子预测分类为“核心”或“组织特异性”。我们在预测的独特组织特异性GQ位点与主调控基因(MRGs)、组织特异性DNase超敏位点以及调节R环形成的蛋白质之间发现了显著重叠。总的来说,这些发现突出了由与组织特化相关的RNA:DNA杂交体介导的MRG和G-翻转子之间的相互作用。