Hubei Hongshan Laboratory, College of Life Science and Technology, College of Biomedicine and Health, Interdisciplinary Sciences Institute, Huazhong Agricultural University, Wuhan 430070, PR China.
Shenzhen Institute of Nutrition and Health, Huazhong Agricultural University, Shenzhen 518000, China.
Nucleic Acids Res. 2024 Jan 5;52(D1):D72-D80. doi: 10.1093/nar/gkad966.
G-quadruplexes (G4s) are non-canonical four-stranded structures and are emerging as novel genetic regulatory elements. However, a comprehensive genomic annotation of endogenous G4s (eG4s) and systematic characterization of their regulatory network are still lacking, posing major challenges for eG4 research. Here, we present EndoQuad (https://EndoQuad.chenzxlab.cn/) to address these pressing issues by integrating high-throughput experimental data. First, based on high-quality genome-wide eG4s mapping datasets (human: 1181; mouse: 24; chicken: 2) generated by G4 ChIP-seq/CUT&Tag, we generate a reference set of genome-wide eG4s. Our multi-omics analyses show that most eG4s are identified in one or a few cell types. The eG4s with higher occurrences across samples are more structurally stable, evolutionarily conserved, enriched in promoter regions, mark highly expressed genes and associate with complex regulatory programs, demonstrating higher confidence level for further experiments. Finally, we integrate millions of functional genomic variants and prioritize eG4s with regulatory functions in disease and cancer contexts. These efforts have culminated in the comprehensive and interactive database of experimentally validated DNA eG4s. As such, EndoQuad enables users to easily access, download and repurpose these data for their own research. EndoQuad will become a one-stop resource for eG4 research and lay the foundation for future functional studies.
G-四链体(G4s)是非经典的四链结构,正逐渐成为新的遗传调控元件。然而,内源性 G4s(eG4s)的全面基因组注释和其调控网络的系统特征仍然缺乏,这给 eG4 研究带来了重大挑战。在这里,我们通过整合高通量实验数据,提出了 EndoQuad(https://EndoQuad.chenzxlab.cn/)来解决这些紧迫的问题。首先,基于 G4 ChIP-seq/CUT&Tag 生成的高质量全基因组 eG4s 图谱数据集(人类:1181;小鼠:24;鸡:2),我们生成了全基因组 eG4s 的参考集。我们的多组学分析表明,大多数 eG4s 仅在一种或几种细胞类型中被识别。在样本中出现频率较高的 eG4s 具有更高的结构稳定性、进化保守性,富集在启动子区域,标记高表达基因,并与复杂的调控程序相关,从而为进一步的实验提供了更高的可信度。最后,我们整合了数百万个功能基因组变体,并优先考虑在疾病和癌症背景下具有调控功能的 eG4s。这些努力最终形成了一个全面的、交互式的实验验证 DNA eG4s 数据库。因此,EndoQuad 使用户能够轻松访问、下载并重新利用这些数据进行自己的研究。EndoQuad 将成为 eG4 研究的一站式资源,并为未来的功能研究奠定基础。