Vathrakokoili Pournara Anna, Miao Zhichao, Beker Ozgur Yilimaz, Nolte Nadja, Brazma Alvis, Papatheodorou Irene
European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, United Kingdom.
Open Targets, Wellcome Genome Campus, Hinxton CB10 1SD, United Kingdom.
Bioinform Adv. 2024 Mar 23;4(1):vbae048. doi: 10.1093/bioadv/vbae048. eCollection 2024.
Cell-type deconvolution methods aim to infer cell composition from bulk transcriptomic data. The proliferation of developed methods coupled with inconsistent results obtained in many cases, highlights the pressing need for guidance in the selection of appropriate methods. Additionally, the growing accessibility of single-cell RNA sequencing datasets, often accompanied by bulk expression from related samples enable the benchmark of existing methods.
In this study, we conduct a comprehensive assessment of 31 methods, utilizing single-cell RNA-sequencing data from diverse human and mouse tissues. Employing various simulation scenarios, we reveal the efficacy of regression-based deconvolution methods, highlighting their sensitivity to reference choices. We investigate the impact of bulk-reference differences, incorporating variables such as sample, study and technology. We provide validation using a gold standard dataset from mononuclear cells and suggest a consensus prediction of proportions when ground truth is not available. We validated the consensus method on data from the stomach and studied its spillover effect. Importantly, we propose the use of the critical assessment of transcriptomic deconvolution (CATD) pipeline which encompasses functionalities for generating references and pseudo-bulks and running implemented deconvolution methods. CATD streamlines simultaneous deconvolution of numerous bulk samples, providing a practical solution for speeding up the evaluation of newly developed methods.
细胞类型反卷积方法旨在从批量转录组数据中推断细胞组成。已开发方法的激增以及在许多情况下获得的结果不一致,凸显了在选择合适方法时迫切需要指导。此外,单细胞RNA测序数据集的可及性不断提高,通常还伴随着相关样本的批量表达,这使得现有方法的基准测试成为可能。
在本研究中,我们利用来自不同人类和小鼠组织的单细胞RNA测序数据,对31种方法进行了全面评估。通过各种模拟场景,我们揭示了基于回归的反卷积方法的有效性,强调了它们对参考选择的敏感性。我们研究了批量参考差异的影响,纳入了样本、研究和技术等变量。我们使用来自单核细胞的金标准数据集进行了验证,并在无法获得真实情况时提出了比例的共识预测。我们在来自胃的数据上验证了共识方法,并研究了其溢出效应。重要的是,我们提出使用转录组反卷积关键评估(CATD)管道,该管道包含生成参考和伪批量以及运行已实施的反卷积方法的功能。CATD简化了对众多批量样本的同时反卷积,为加速新开发方法的评估提供了一个实用的解决方案。