Fu Yile, Youness Mohamad, Virzì Alessia, Song Xinran, Tubeeckx Michiel R L, De Keulenaer Gilles W, Heidbuchel Hein, Segers Vincent F M, Sipido Karin R, Thienpont Bernard, Roderick H Llewelyn
Laboratory of Experimental Cardiology, Department of Cardiovascular Sciences, KU Leuven, Herestraat 49, 3000 Leuven, Belgium.
Laboratory for Functional Epigenetics, Department of Human Genetics, KU Leuven, Herestraat 49, 3000 Leuven, Belgium.
Brief Bioinform. 2025 Jul 2;26(4). doi: 10.1093/bib/bbaf371.
Single-nucleus RNA sequencing (snRNA-Seq) has transformed our understanding of complex tissues, providing insights into cellular composition and heterogeneity in gene expression between cells, and their alterations in development and disease. High costs however constrain the number of samples analysed. Sample pooling and their demultiplexing following sequencing based on prior labelling with antibodies or lipid anchors conjugated to DNA barcodes (cell hashing and MULTI-seq), or using genetic differences between samples, provides a solution. However, there remains no comprehensive evaluation of these demultiplexing tools to guide selection between them. Here, we benchmark the leading software (Vireo, Souporcell, Freemuxlet, scSplit) used for sample demultiplexing using genetic variants. We further compared obtaining genetic variants from SNP array analysis of gDNA and from sample-matched bulk-RNA-Seq data, identified using three different variant calling tools (BCFtools, cellSNP, FreeBayes). Demultiplexing performance was evaluated on simulated multiplexed datasets comprising two, four, and six samples with doublet percentages between 0% and 30%, and validated against demultiplexing using sex-linked genes. Software implementation and execution were evaluated by run speed, robustness, scalability, and usability. Our results show that all tools excluding scSplit provide high recall and precision with an accuracy of 80%-85%. Vireo achieved the best accuracy. Demultiplexing tools were differentially affected by the variant calling tool with which it was paired. For all tools, accuracy decreased with the increasing percentage of doublets. Deployment of demultiplexing during analysis of pooled real-world 10x RNA-Seq data from the human heart and from different species is shown, as are advantages for doublet detection and removal.
单核RNA测序(snRNA-Seq)改变了我们对复杂组织的理解,为细胞组成、细胞间基因表达的异质性及其在发育和疾病中的变化提供了见解。然而,高成本限制了分析样本的数量。样本合并以及在测序后基于先前用与DNA条形码偶联的抗体或脂质锚定物进行标记(细胞哈希和MULTI-seq)或利用样本间的遗传差异进行解复用,提供了一种解决方案。然而,目前尚无对这些解复用工具的全面评估来指导它们之间的选择。在这里,我们对用于使用遗传变异进行样本解复用的领先软件(Vireo、Souporcell、Freemuxlet、scSplit)进行了基准测试。我们进一步比较了从gDNA的SNP阵列分析和从样本匹配的批量RNA-Seq数据中获得遗传变异,这些数据是使用三种不同的变异调用工具(BCFtools、cellSNP、FreeBayes)识别的。在包含两个、四个和六个样本且双峰百分比在0%至30%之间的模拟多路复用数据集上评估了解复用性能,并通过使用性连锁基因进行解复用进行了验证。通过运行速度、稳健性、可扩展性和可用性评估了软件的实现和执行情况。我们的结果表明,除scSplit外的所有工具都具有较高的召回率和精确率,准确率在80%-85%之间。Vireo的准确率最高。解复用工具受到与其配对的变异调用工具的不同影响。对于所有工具,准确率随着双峰百分比的增加而降低。展示了在分析来自人类心脏和不同物种的汇总真实世界10x RNA-Seq数据期间进行解复用的情况,以及双峰检测和去除的优势。