Sun Siqi, Yadav Shweta, Pingili Mulini, Chang Dan, Wang Jing
Genomics Research Center, AbbVie, 200 Sidney Street, Cambridge, MA 02139, United States.
Comput Struct Biotechnol J. 2025 Aug 5;27:3579-3588. doi: 10.1016/j.csbj.2025.07.058. eCollection 2025.
Cell deconvolution is a widely used method to characterize the composition of the mixed cell population in bulk transcriptomic datasets. While tissue- and blood-derived cell reference matrices (CRMs) are commonly used, their impact on deconvolution accuracy has yet to be systematically evaluated. In this study, we developed tissue- and blood-derived CRMs using single-cell RNA sequencing (scRNA-seq) data from inflammatory bowel disease (IBD). Three publicly available blood-derived CRMs (IRIS, LM22, and ImmunoStates) were incorporated for benchmarking. Deconvolution performance was evaluated using both public bulk transcriptomic datasets and simulated pseudobulk samples by goodness-of-fit and cell fractions correlation. Two infliximab-treated bulk datasets were used to identify treatment-related cell types. In addition, lung adenocarcinoma (LUAD) single-cell and bulk transcriptomic datasets were also used for deconvolution evaluation. We found tissue-derived CRMs consistently outperformed blood-derived CRMs in deconvolving bulk tissue transcriptomes, exhibiting higher goodness-of-fit and more accurate cellular proportion estimates, particularly for immune and stromal cells. They also revealed more treatment-related cell types. In contrast, all CRMs performed similarly when applied to blood bulk transcriptomics. These trends also were shown in the LUAD datasets. Our results emphasize the importance of selecting appropriate CRMs for cell deconvolution in bulk tissue transcriptomes, particularly in immunology and oncology. Such considerations can be extended to encompass other disease implications. The R package (DeconvRef) for building user-defined CRMs is available at https://github.com/alohasiqi/DeconvRef.
细胞反卷积是一种广泛应用的方法,用于在批量转录组数据集中表征混合细胞群体的组成。虽然通常使用组织和血液来源的细胞参考矩阵(CRM),但其对反卷积准确性的影响尚未得到系统评估。在本研究中,我们使用来自炎症性肠病(IBD)的单细胞RNA测序(scRNA-seq)数据开发了组织和血液来源的CRM。纳入了三个公开可用的血液来源CRM(IRIS、LM22和免疫状态)进行基准测试。通过拟合优度和细胞分数相关性,使用公共批量转录组数据集和模拟的伪批量样本评估反卷积性能。使用两个英夫利昔单抗治疗的批量数据集来识别与治疗相关的细胞类型。此外,肺腺癌(LUAD)单细胞和批量转录组数据集也用于反卷积评估。我们发现,在对批量组织转录组进行反卷积时,组织来源的CRM始终优于血液来源的CRM,表现出更高的拟合优度和更准确的细胞比例估计,特别是对于免疫细胞和基质细胞。它们还揭示了更多与治疗相关的细胞类型。相比之下,当应用于血液批量转录组学时,所有CRM的表现相似。这些趋势在LUAD数据集中也有体现。我们的结果强调了在批量组织转录组中选择合适的CRM进行细胞反卷积的重要性,特别是在免疫学和肿瘤学中。这些考虑可以扩展到涵盖其他疾病影响。用于构建用户定义CRM的R包(DeconvRef)可在https://github.com/alohasiqi/DeconvRef获取。