Demuxmix：使用回归混合模型对寡核苷酸条形码单细胞RNA测序数据进行解复用

demuxmix: Demultiplexing oligonucleotide-barcoded single-cell RNA sequencing data with regression mixture models.

作者信息

Klein Hans-Ulrich

机构信息

Center for Translational & Computational Neuroimmunology, Department of Neurology, Columbia University Irving Medical Center, New York, NY, USA.

Taub Institute for Research on Alzheimer's Disease and the Aging Brain, Columbia University Irving Medical Center, New York, NY, USA.

出版信息

bioRxiv. 2023 Jan 29:2023.01.27.525961. doi: 10.1101/2023.01.27.525961.

DOI:10.1101/2023.01.27.525961

PMID:36747615

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9901175/

Abstract

MOTIVATION

Droplet-based single-cell RNA sequencing (scRNA-seq) is widely used in biomedical research to interrogate the transcriptomes of single cells on a large scale. Pooling and processing cells from different samples together can reduce costs and batch effects. In order to pool cells, cells are often first labeled with hashtag oligonucleotides (HTOs). These HTOs are sequenced along with the cells' RNA in the droplets and are subsequently used to computationally assign each droplet to its sample of origin, which is referred to as demultiplexing. Accurate demultiplexing is crucial and can be challenging due to background HTOs, low-quality cells/cell debris, and multiplets.

RESULTS

A new demultiplexing method, demuxmix, based on negative binomial regression mixture models is introduced. The method implements two significant improvements. First, demuxmix's probabilistic classification framework provides error probabilities for droplet assignments that can be used to discard uncertain droplets and inform about the quality of the HTO data and the demultiplexing success. Second, demuxmix utilizes the positive association between detected genes in the RNA library and HTO counts to explain parts of the variance in the HTO data resulting in improved droplet assignments. The improved performance of demuxmix compared to existing demultiplexing methods is assessed on real and simulated data. Finally, the feasibility of accurately demultiplexing experimental designs where non-labeled cells are pooled with labeled cells is demonstrated.

AVAILABILITY

R/Bioconductor package demuxmix ( https://doi.org/doi:10.18129/B9.bioc.demuxmix ).

摘要

动机

基于液滴的单细胞RNA测序（scRNA-seq）在生物医学研究中被广泛用于大规模探究单细胞的转录组。将来自不同样本的细胞汇集并一起处理可以降低成本和批次效应。为了汇集细胞，细胞通常首先用哈希寡核苷酸（HTO）进行标记。这些HTO与液滴中的细胞RNA一起被测序，随后用于通过计算将每个液滴分配到其原始样本，这被称为解复用。准确的解复用至关重要，但由于背景HTO、低质量细胞/细胞碎片和多重细胞，可能具有挑战性。

结果

引入了一种基于负二项回归混合模型的新解复用方法demuxmix。该方法实现了两项重大改进。首先，demuxmix的概率分类框架为液滴分配提供错误概率，可用于丢弃不确定的液滴，并告知HTO数据的质量和解复用的成功情况。其次，demuxmix利用RNA文库中检测到的基因与HTO计数之间的正相关关系来解释HTO数据中的部分方差，从而改进液滴分配。在真实数据和模拟数据上评估了demuxmix与现有解复用方法相比的改进性能。最后，证明了将未标记细胞与标记细胞汇集的实验设计进行准确解复用的可行性。