Cancer Research UK Cambridge Institute, University of Cambridge, Cambridge, CB2 0RE, United Kingdom.
Cambridge Institute for Medical Research, University of Cambridge, Cambridge, CB2 0XY, United Kingdom.
Nat Commun. 2018 Jul 10;9(1):2667. doi: 10.1038/s41467-018-05083-x.
Barcode swapping results in the mislabelling of sequencing reads between multiplexed samples on patterned flow-cell Illumina sequencing machines. This may compromise the validity of numerous genomic assays; however, the severity and consequences of barcode swapping remain poorly understood. We have used two statistical approaches to robustly quantify the fraction of swapped reads in two plate-based single-cell RNA-sequencing datasets. We found that approximately 2.5% of reads were mislabelled between samples on the HiSeq 4000, which is lower than previous reports. We observed no correlation between the swapped fraction of reads and the concentration of free barcode across plates. Furthermore, we have demonstrated that barcode swapping may generate complex but artefactual cell libraries in droplet-based single-cell RNA-sequencing studies. To eliminate these artefacts, we have developed an algorithm to exclude individual molecules that have swapped between samples in 10x Genomics experiments, allowing the continued use of cutting-edge sequencing machines for these assays.
条形码交换会导致 Illumina 测序仪上的图案流动池的多路复用样本之间测序读取的标签错误。这可能会影响许多基因组检测的有效性;然而,条形码交换的严重程度和后果仍了解甚少。我们使用了两种统计方法来稳健地量化两个基于平板的单细胞 RNA 测序数据集之间交换读取的分数。我们发现,HiSeq 4000 上的样本之间大约有 2.5%的读取被错误标记,这低于之前的报告。我们没有观察到读取的交换分数与板间游离条形码浓度之间的相关性。此外,我们已经证明,条形码交换可能会在基于液滴的单细胞 RNA 测序研究中产生复杂但人为的细胞文库。为了消除这些伪影,我们开发了一种算法来排除 10x Genomics 实验中在样本之间交换的单个分子,从而允许这些检测继续使用最先进的测序仪。