Department of Automation, Xiamen University, Xiamen, China.
Key Laboratory of System Control and Information Processing, Ministry of Education, Shanghai, China.
BMC Genomics. 2024 Sep 18;25(1):875. doi: 10.1186/s12864-024-10728-x.
BACKGROUND: The widely adopted bulk RNA-seq measures the gene expression average of cells, masking cell type heterogeneity, which confounds downstream analyses. Therefore, identifying the cellular composition and cell type-specific gene expression profiles (GEPs) facilitates the study of the underlying mechanisms of various biological processes. Although single-cell RNA-seq focuses on cell type heterogeneity in gene expression, it requires specialized and expensive resources and currently is not practical for a large number of samples or a routine clinical setting. Recently, computational deconvolution methodologies have been developed, while many of them only estimate cell type composition or cell type-specific GEPs by requiring the other as input. The development of more accurate deconvolution methods to infer cell type abundance and cell type-specific GEPs is still essential. RESULTS: We propose a new deconvolution algorithm, DSSC, which infers cell type-specific gene expression and cell type proportions of heterogeneous samples simultaneously by leveraging gene-gene and sample-sample similarities in bulk expression and single-cell RNA-seq data. Through comparisons with the other existing methods, we demonstrate that DSSC is effective in inferring both cell type proportions and cell type-specific GEPs across simulated pseudo-bulk data (including intra-dataset and inter-dataset simulations) and experimental bulk data (including mixture data and real experimental data). DSSC shows robustness to the change of marker gene number and sample size and also has cost and time efficiencies. CONCLUSIONS: DSSC provides a practical and promising alternative to the experimental techniques to characterize cellular composition and heterogeneity in the gene expression of heterogeneous samples.
背景:广泛采用的 bulk RNA-seq 测量细胞的基因表达平均值,掩盖了细胞类型异质性,从而混淆了下游分析。因此,确定细胞组成和细胞类型特异性基因表达谱(GEPs)有助于研究各种生物过程的潜在机制。尽管单细胞 RNA-seq 侧重于基因表达中的细胞类型异质性,但它需要专门的昂贵资源,目前对于大量样本或常规临床环境来说并不实用。最近,已经开发了计算去卷积方法,但其中许多方法仅通过将另一个作为输入来估计细胞类型组成或细胞类型特异性 GEPs。开发更准确的去卷积方法来推断细胞类型丰度和细胞类型特异性 GEPs 仍然至关重要。
结果:我们提出了一种新的去卷积算法 DSSC,该算法通过利用 bulk 表达和单细胞 RNA-seq 数据中的基因-基因和样本-样本相似性,同时推断异质样本的细胞类型特异性基因表达和细胞类型比例。通过与其他现有方法的比较,我们证明 DSSC 有效地推断了模拟伪 bulk 数据(包括数据集内和数据集间模拟)和实验 bulk 数据(包括混合物数据和真实实验数据)中的细胞类型比例和细胞类型特异性 GEPs。DSSC 对标记基因数量和样本大小的变化具有稳健性,并且具有成本和时间效率。
结论:DSSC 为实验技术提供了一种实用且有前途的替代方法,可用于表征异质样本中细胞组成和基因表达异质性。
Brief Bioinform. 2022-11-19
BMC Bioinformatics. 2024-6-12
Genome Biol. 2023-8-1
BMC Med Genomics. 2023-10-31
Bioinformatics. 2022-9-16
Biomolecules. 2025-3-20
Nucleic Acids Res. 2021-5-7
Nat Commun. 2020-11-6
Brief Bioinform. 2021-1-18