Department of Human Genetics, McGill University, Montreal, QC, H3A 0C7, Canada.
McGill University Genome Centre, Montreal, QC, H3A 0G1, Canada.
Nat Commun. 2020 Jun 1;11(1):2704. doi: 10.1038/s41467-020-16522-z.
Index hopping is the main cause of incorrect sample assignment of sequencing reads in multiplexed pooled libraries. We introduce a statistical model for estimating the sample index-hopping rate in multiplexed droplet-based single-cell RNA-seq data and for probabilistic inference of the true sample of origin of hopped reads. We analyze several datasets and estimate the sample index hopping probability to range between 0.003-0.009, a small number that counter-intuitively gives rise to a large fraction of phantom molecules - the fraction of phantom molecules exceeds 8% in more than 25% of samples and reaches as high as 85% in low-complexity samples. Phantom molecules lead to widespread complications in downstream analyses, including transcriptome mixing across cells, emergence of phantom copies of cells from other samples, and misclassification of empty droplets as cells. We demonstrate that our approach can correct for these artifacts by accurately purging the majority of phantom molecules from the data.
索引跳跃是导致测序reads 在多路复用池库中错误分配样本的主要原因。我们引入了一种统计模型,用于估计多路复用液滴式单细胞 RNA-seq 数据中索引跳跃的样本率,并对跳跃reads 的真实来源样本进行概率推断。我们分析了多个数据集,并估计样本索引跳跃概率在 0.003-0.009 之间,这个小数字出人意料地导致了大量的幻影分子——超过 25%的样本中幻影分子的比例超过 8%,在低复杂度的样本中甚至高达 85%。幻影分子会给下游分析带来广泛的并发症,包括跨细胞的转录组混合、来自其他样本的幻影细胞的出现,以及将空液滴错误分类为细胞。我们证明,我们的方法可以通过从数据中准确清除大多数幻影分子来纠正这些伪影。