Center for Translational and Computational Neuroimmunology, Department of Neurology, Columbia University Irving Medical Center, New York, NY 10032, United States.
Taub Institute for Research on Alzheimer's Disease and the Aging Brain, Columbia University Irving Medical Center, New York, NY 10032, United States.
Bioinformatics. 2023 Aug 1;39(8). doi: 10.1093/bioinformatics/btad481.
Droplet-based single-cell RNA sequencing (scRNA-seq) is widely used in biomedical research for interrogating the transcriptomes of single cells on a large scale. Pooling and processing cells from different samples together can reduce costs and batch effects. To pool cells, they are often first labeled with hashtag oligonucleotides (HTOs). These HTOs are sequenced alongside the cells' RNA in the droplets and subsequently used to computationally assign each droplet to its sample of origin, a process referred to as demultiplexing. Accurate demultiplexing is crucial but can be challenging due to background HTOs, low-quality cells/cell debris, and multiplets.
A new demultiplexing method based on negative binomial regression mixture models is introduced. The method, called demuxmix, implements two significant improvements. First, demuxmix's probabilistic classification framework provides error probabilities for droplet assignments that can be used to discard uncertain droplets and inform about the quality of the HTO data and the success of the demultiplexing process. Second, demuxmix utilizes the positive association between detected genes in the RNA library and HTO counts to explain parts of the variance in the HTO data resulting in improved droplet assignments. The improved performance of demuxmix compared with existing demultiplexing methods is assessed using real and simulated data. Finally, the feasibility of accurately demultiplexing experimental designs where non-labeled cells are pooled with labeled cells is demonstrated.
R/Bioconductor package demuxmix (https://doi.org/doi:10.18129/B9.bioc.demuxmix).
基于液滴的单细胞 RNA 测序 (scRNA-seq) 在生物医学研究中被广泛用于大规模检测单细胞的转录组。将来自不同样本的细胞混合并处理可以降低成本和批次效应。为了混合细胞,它们通常首先用标签寡核苷酸 (HTO) 进行标记。这些 HTO 与细胞的 RNA 一起在液滴中测序,随后用于计算将每个液滴分配到其原始样本,这个过程称为多路分解。准确的多路分解至关重要,但由于背景 HTO、低质量的细胞/细胞碎片和多联体,可能具有挑战性。
引入了一种基于负二项回归混合模型的新多路分解方法。该方法称为 demuxmix,实现了两个重要改进。首先,demuxmix 的概率分类框架为液滴分配提供错误概率,可以用于丢弃不确定的液滴,并提供关于 HTO 数据质量和多路分解过程成功的信息。其次,demuxmix 利用 RNA 文库中检测到的基因与 HTO 计数之间的正相关关系来解释 HTO 数据中部分方差,从而提高液滴分配的准确性。使用真实和模拟数据评估了 demuxmix 与现有多路分解方法相比的性能改进。最后,证明了在与标记细胞混合的非标记细胞中准确多路分解实验设计的可行性。
R/Bioconductor 包 demuxmix(https://doi.org/doi:10.18129/B9.bioc.demuxmix)。