Suppr超能文献

估计并校正单细胞RNA测序数据中的索引跳跃错误分配

Estimating and correcting index hopping misassignments in single-cell RNA-seq data.

作者信息

Miao Lingling, Collado Loren, Barkdull Savannah, Saito Yoshine, Jo Jay-Hyun, Han Jungmin, Dell'Orso Stefania, Kelly Michael C, Kong Heidi H, Brownell Isaac

出版信息

bioRxiv. 2024 Dec 16:2024.10.21.619353. doi: 10.1101/2024.10.21.619353.

Abstract

BACKGROUND

Index hopping causes read assignment errors in data from multiplexed sequencing libraries. This issue has become more prevalent with the widespread use of high-capacity sequencers and highly multiplexed single-cell RNA sequencing (scRNA- seq).

RESULTS

We conducted deep, plate-based scRNA-seq on a mixed population of mouse skin cells. Analysis of transcriptomes from 1152 cells identified four distinct cell types. To estimate the error rate in sample assignment due to index hopping, we employed differential expression analysis to identify signature genes that were highly and specifically expressed in each cell type. We quantified the proportion of misassigned reads by examining the detection rates of signature genes in other cell types. Remarkably, regardless of gene expression levels, we estimated that 0.65% of reads per gene were assigned to incorrect cell across our data. To computationally compensate for index hopping, we developed a simple correction method wherein, for each gene, 0.65% of the library's average expression level was subtracted from the expression in each cell. This correction had notable effects on transcriptome analyses, including increased cell-cell clustering distance and alterations in intermediate state assignments of cell differentiation.

CONCLUSIONS

Index hopping misassignments are measurable and can impact the experimental interpretation of sequencing results. We devised a straightforward method to estimate and correct for the index hopping rate by quantifying misassigned genes in distinct cell types within an scRNA-seq library. This approach can be applied to any barcoded, multiplexed scRNA-seq library containing cells with distinct expression profiles, allowing for correction of the expression matrix before conducting biological analysis.

摘要

背景

索引跳跃会导致来自多重测序文库的数据出现读段分配错误。随着高容量测序仪和高度多重单细胞RNA测序(scRNA-seq)的广泛应用,这个问题变得更加普遍。

结果

我们对混合的小鼠皮肤细胞群体进行了基于平板的深度scRNA-seq。对1152个细胞的转录组分析确定了四种不同的细胞类型。为了估计由于索引跳跃导致的样本分配错误率,我们采用差异表达分析来识别在每种细胞类型中高度且特异性表达的特征基因。我们通过检查其他细胞类型中特征基因的检测率来量化错误分配读段的比例。值得注意的是,无论基因表达水平如何,我们估计在我们的数据中每个基因有0.65%的读段被分配到了错误的细胞。为了通过计算补偿索引跳跃,我们开发了一种简单的校正方法,即对于每个基因,从每个细胞的表达中减去文库平均表达水平的0.65%。这种校正对转录组分析有显著影响,包括增加细胞间聚类距离以及细胞分化中间状态分配的改变。

结论

索引跳跃错误分配是可测量的,并且会影响测序结果的实验解读。我们设计了一种直接的方法,通过量化scRNA-seq文库中不同细胞类型中的错误分配基因来估计和校正索引跳跃率。这种方法可以应用于任何包含具有不同表达谱细胞的条形码多重scRNA-seq文库,允许在进行生物学分析之前校正表达矩阵。

相似文献

本文引用的文献

9
Reversed graph embedding resolves complex single-cell trajectories.反向图嵌入解析复杂的单细胞轨迹。
Nat Methods. 2017 Oct;14(10):979-982. doi: 10.1038/nmeth.4402. Epub 2017 Aug 21.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验