Department of Computer Science, Princeton University, Princeton, NJ, USA.
Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, USA.
Genome Biol. 2020 Jul 3;21(1):160. doi: 10.1186/s13059-020-02078-0.
Single-cell RNA-seq (scRNA-seq) profiles gene expression of individual cells. Unique molecular identifiers (UMIs) remove duplicates in read counts resulting from polymerase chain reaction, a major source of noise. For scRNA-seq data lacking UMIs, we propose quasi-UMIs: quantile normalization of read counts to a compound Poisson distribution empirically derived from UMI datasets. When applied to ground-truth datasets having both reads and UMIs, quasi-UMI normalization has higher accuracy than competing methods. Using quasi-UMIs enables methods designed specifically for UMI data to be applied to non-UMI scRNA-seq datasets.
单细胞 RNA 测序 (scRNA-seq) 对单个细胞的基因表达进行测序。独特分子标识符 (UMI) 去除聚合酶链反应 (PCR) 产生的读段计数中的重复,PCR 是主要的噪声来源。对于缺少 UMI 的 scRNA-seq 数据,我们提出了准 UMI:通过从 UMI 数据集中推导出的复合泊松分布对读段计数进行分位数归一化。当应用于同时具有读段和 UMI 的真实数据集时,准 UMI 归一化比竞争方法具有更高的准确性。使用准 UMI 可以使专门为 UMI 数据设计的方法应用于非 UMI scRNA-seq 数据集。