Zheng Cheng, Wang Man, Yamada Ryo, Okada Daigo
Center for Genomic Medicine, Graduate School of Medicine, Kyoto University, South Research Bldg. No.1(5F), 53 Shogoinkawahara-cho, Sakyo-ku, Kyoto, 6068507, Kyoto, Japan.
Department of Signal Transduction, Research Institute for Microbial Diseases, Osaka University, 3-1 Yamadaoka, Suita, 5650871, Osaka, Japan.
Comput Struct Biotechnol J. 2023 Oct 11;21:4988-5002. doi: 10.1016/j.csbj.2023.09.042. eCollection 2023.
Gene sets are functional units for living cells. Previously, limited studies investigated the complex relations among gene sets, but documents about their altering patterns across biological conditions still need to be prepared. In this study, we adopted and modified a classical k-nearest neighbor-based association function to detect inter-gene-set similarities. Based on this method, we built multiplex networks of gene sets for the first time; these networks contain layers of gene sets corresponding to different populations of cells. The context-based multiplex networks can capture meaningful biological variation and have considerable differences from knowledge-based networks of gene sets built on Jaccard similarity, as demonstrated in this study. Furthermore, at the scale of individual gene sets, the structural coefficients of gene sets (multiplex PageRank centrality, clustering coefficient, and participation coefficient) disclose the diversity of gene sets from the perspective of structural properties and make it easier to identify unique gene sets. In gene set enrichment analysis (GSEA), each gene set is treated independently, and its contextual and relational attributes are ignored. The structural coefficients of gene sets can supplement GSEA with information about the overall picture of gene sets, promoting the constructive reorganization of the enriched terms and helping researchers better prioritize and select gene sets.
基因集是活细胞的功能单元。以前,对基因集之间复杂关系的研究有限,但关于它们在不同生物学条件下变化模式的文献仍有待完善。在本研究中,我们采用并修改了一种基于经典k近邻的关联函数来检测基因集间的相似性。基于此方法,我们首次构建了基因集的多重网络;这些网络包含对应于不同细胞群体的基因集层。如本研究所示,基于上下文的多重网络能够捕捉有意义的生物学变异,并且与基于杰卡德相似性构建的基于知识的基因集网络有显著差异。此外,在单个基因集的层面上,基因集的结构系数(多重PageRank中心性、聚类系数和参与系数)从结构特性的角度揭示了基因集的多样性,使得识别独特的基因集更加容易。在基因集富集分析(GSEA)中,每个基因集被独立对待,其上下文和关系属性被忽略。基因集的结构系数可以为GSEA补充有关基因集整体情况的信息,促进富集术语的建设性重组,并帮助研究人员更好地对基因集进行优先级排序和选择。