Suppr超能文献

gdGSE:一种通过离散化基因表达值来评估通路富集的算法。

gdGSE: An algorithm to evaluate pathway enrichment by discretizing gene expression values.

作者信息

Luo Jiangti, Lu Qiqi, He Mengjiao, Zhang Xiaobo, Yang Xiang, Wang Xiaosheng

机构信息

Biomedical Informatics Research Lab, School of Basic Medicine and Clinical Pharmacy, China Pharmaceutical University, Nanjing 211198, China.

Intelligent Pharmacy Interdisciplinary Research Center, China Pharmaceutical University, Nanjing 211198, China.

出版信息

Comput Struct Biotechnol J. 2025 May 1;27:1772-1783. doi: 10.1016/j.csbj.2025.04.038. eCollection 2025.

Abstract

We proposed gdGSE, a novel computational framework for gene set enrichment analysis. Unlike conventional methods that rely on continuous gene expression values, gdGSE employs discretized gene expression profiles to assess pathway activity. This approach effectively mitigates discrepancies caused by data distributions. This algorithm consists of two steps: (1) applying statistical thresholds binarizing gene expression matrix, and (2) converting the binarized gene expression matrix into a gene set enrichment matrix. Our results demonstrated that gdGSE could robustly extract biological insights from a diverse array of simulated and real bulk or single-cell gene expression datasets. Notably, gene set enrichment scores by gdGSE exhibited enhanced utility in downstream applications: (1) precise quantification of cancer stemness with significant prognostic relevance; (2) enhanced clustering performance in stratifying tumor subtypes with distinct prognoses; and (3) more accurate identification of cell types. Remarkably, the pathway activity scores by gdGSE showed > 90 % concordance with experimentally validated drug mechanisms in patients-derived xenografts and estrogen receptor-positive breast cancer cell lines. Our algorithm proposes that discretizing gene expression values provides an alternative method for evaluating pathway enrichment, applicable to both bulk and single-cell data analysis.

摘要

我们提出了gdGSE,一种用于基因集富集分析的新型计算框架。与依赖连续基因表达值的传统方法不同,gdGSE采用离散化的基因表达谱来评估通路活性。这种方法有效地减轻了由数据分布引起的差异。该算法包括两个步骤:(1)应用统计阈值对基因表达矩阵进行二值化,以及(2)将二值化的基因表达矩阵转换为基因集富集矩阵。我们的结果表明,gdGSE能够从各种模拟和真实的批量或单细胞基因表达数据集中稳健地提取生物学见解。值得注意的是,gdGSE的基因集富集分数在下游应用中表现出更高的效用:(1)精确量化具有显著预后相关性的癌症干性;(2)在区分具有不同预后的肿瘤亚型时增强聚类性能;以及(3)更准确地识别细胞类型。值得注意的是,gdGSE的通路活性分数与患者来源的异种移植和雌激素受体阳性乳腺癌细胞系中经过实验验证的药物机制显示出> 90% 的一致性。我们的算法表明,离散化基因表达值为评估通路富集提供了一种替代方法,适用于批量和单细胞数据分析。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f6ea/12127574/db0ea67cc872/ga1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验