Graduate School of Pharmaceutical Sciences, The University of Tokyo, 7-3-1, Bunkyo-ku 113-0033, Japan.
Brief Bioinform. 2024 May 23;25(4). doi: 10.1093/bib/bbae315.
Inferring cell type proportions from bulk transcriptome data is crucial in immunology and oncology. Here, we introduce guided LDA deconvolution (GLDADec), a bulk deconvolution method that guides topics using cell type-specific marker gene names to estimate topic distributions for each sample. Through benchmarking using blood-derived datasets, we demonstrate its high estimation performance and robustness. Moreover, we apply GLDADec to heterogeneous tissue bulk data and perform comprehensive cell type analysis in a data-driven manner. We show that GLDADec outperforms existing methods in estimation performance and evaluate its biological interpretability by examining enrichment of biological processes for topics. Finally, we apply GLDADec to The Cancer Genome Atlas tumor samples, enabling subtype stratification and survival analysis based on estimated cell type proportions, thus proving its practical utility in clinical settings. This approach, utilizing marker gene names as partial prior information, can be applied to various scenarios for bulk data deconvolution. GLDADec is available as an open-source Python package at https://github.com/mizuno-group/GLDADec.
从批量转录组数据推断细胞类型比例在免疫学和肿瘤学中至关重要。在这里,我们介绍了引导 LDA 去卷积(GLDADec),这是一种批量去卷积方法,它使用细胞类型特异性标记基因名称来引导主题,以估计每个样本的主题分布。通过使用血液衍生数据集进行基准测试,我们证明了它具有很高的估计性能和鲁棒性。此外,我们将 GLDADec 应用于异质组织批量数据,并以数据驱动的方式进行全面的细胞类型分析。我们表明,GLDADec 在估计性能方面优于现有方法,并通过检查主题的生物学过程富集来评估其生物学可解释性。最后,我们将 GLDADec 应用于癌症基因组图谱肿瘤样本,能够基于估计的细胞类型比例进行亚型分层和生存分析,从而证明其在临床环境中的实际应用价值。这种方法利用标记基因名称作为部分先验信息,可以应用于批量数据去卷积的各种情况。GLDADec 可在 https://github.com/mizuno-group/GLDADec 上作为开源 Python 包获得。