Li Terence, Alvarez Marcus, Liu Cuining, Abuhanna Kevin, Sun Yu, Ernst Jason, Plath Kathrin, Balliu Brunilda, Luo Chongyuan, Zaitlen Noah
Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles.
Bioinformatics Graduate Program, University of California, Los Angeles.
bioRxiv. 2025 Feb 8:2025.02.06.636969. doi: 10.1101/2025.02.06.636969.
Sample multiplexing has become an increasingly common design choice in droplet-based single-nucleus multi-omic sequencing experiments to reduce costs and remove technical variation. Genotype-based demultiplexing is one popular class of methods that was originally developed for single-cell RNA-seq, but has not been rigorously benchmarked in other assays, such as snATAC-seq and joint snRNA/snATAC assays, especially in the context of variable ambient RNA/DNA contamination. To address this, we develop ambisim, a genotype-aware read-level simulator that can flexibly control ambient molecule proportions and generate realistic joint snRNA/snATAC data. We use ambisim to evaluate demultiplexing methods across several important parameters: doublet rate, number of multiplexed donors, and coverage levels. Our simulations reveal that methods are variably impacted by ambient contamination in both modalities. We then applied the demultiplexing methods to two joint snRNA/snATAC datasets and found highly variable concordance between methods in both modalities. Finally, we develop a new metric, , which we show is correlated with cell-level ambient molecule fractions in singlets. Applying our metric to two multiplexed joint snRNA/snATAC datasets reveals variable ambient contamination across experiments and modalities. We conclude that improved modelling of ambient material in demultiplexing algorithms will increase both sensitivity and specificity.
在基于液滴的单核多组学测序实验中,样本复用已成为一种越来越常见的设计选择,以降低成本并消除技术变异。基于基因型的解复用是一类流行的方法,最初是为单细胞RNA测序开发的,但尚未在其他检测方法(如snATAC测序和联合snRNA/snATAC检测)中进行严格的基准测试,特别是在可变环境RNA/DNA污染的情况下。为了解决这个问题,我们开发了ambisim,一种基于基因型的读段级模拟器,它可以灵活控制环境分子比例并生成逼真的联合snRNA/snATAC数据。我们使用ambisim在几个重要参数上评估解复用方法:双峰率、复用供体数量和覆盖水平。我们的模拟表明,两种模式下的方法都受到环境污染的不同影响。然后,我们将解复用方法应用于两个联合snRNA/snATAC数据集,发现两种模式下方法之间的一致性差异很大。最后,我们开发了一种新的指标,我们表明它与单细胞中细胞水平的环境分子分数相关。将我们的指标应用于两个复用的联合snRNA/snATAC数据集,揭示了不同实验和模式下可变的环境污染。我们得出结论,在解复用算法中改进环境物质的建模将提高灵敏度和特异性。