Rapoport Rachel, Greenberg Avraham, Yakhini Zohar, Simon Itamar
Microbiology and Molecular Genetics, Hebrew University of Jerusalem-IMRIC, Jerusalem 9112102, Israel.
Efi Arazi School of Computer Science, Reichman University (IDC Herzliya), Herzliya 4610101, Israel.
Biology (Basel). 2024 Mar 8;13(3):175. doi: 10.3390/biology13030175.
Traditional gene set enrichment analysis falters when applied to large genomic domains, where neighboring genes often share functions. This spatial dependency creates misleading enrichments, mistaking mere physical proximity for genuine biological connections. Here we present Spatial Adjusted Gene Ontology (SAGO), a novel cyclic permutation-based approach, to tackle this challenge. SAGO separates enrichments due to spatial proximity from genuine biological links by incorporating the genes' spatial arrangement into the analysis. We applied SAGO to various datasets in which the identified genomic intervals are large, including replication timing domains, large H3K9me3 and H3K27me3 domains, HiC compartments and lamina-associated domains (LADs). Intriguingly, applying SAGO to prostate cancer samples with large copy number alteration (CNA) domains eliminated most of the enriched GO terms, thus helping to accurately identify biologically relevant gene sets linked to oncogenic processes, free from spatial bias.
当应用于大型基因组区域时,传统的基因集富集分析会陷入困境,因为相邻基因通常具有共享功能。这种空间依赖性会产生误导性的富集结果,将仅仅是物理上的邻近误认为是真正的生物学联系。在此,我们提出了空间调整基因本体(SAGO),这是一种基于循环排列的新方法,以应对这一挑战。SAGO通过将基因的空间排列纳入分析,将由于空间邻近导致的富集与真正的生物学联系区分开来。我们将SAGO应用于各种已识别基因组区间较大的数据集,包括复制时间域、大型H3K9me3和H3K27me3域、HiC区室和核纤层相关域(LADs)。有趣的是,将SAGO应用于具有大量拷贝数改变(CNA)域的前列腺癌样本时,消除了大部分富集的GO术语,从而有助于准确识别与致癌过程相关的生物学相关基因集,而不受空间偏差的影响。