Department of Computer Science, Stony Brook University, Stony Brook, USA.
Cambridge Centre for Proteomics, Department of Biochemistry, University of Cambridge, Cambridge, CB2 1GA, UK.
Genome Biol. 2019 Mar 27;20(1):65. doi: 10.1186/s13059-019-1670-y.
We introduce alevin, a fast end-to-end pipeline to process droplet-based single-cell RNA sequencing data, performing cell barcode detection, read mapping, unique molecular identifier (UMI) deduplication, gene count estimation, and cell barcode whitelisting. Alevin's approach to UMI deduplication considers transcript-level constraints on the molecules from which UMIs may have arisen and accounts for both gene-unique reads and reads that multimap between genes. This addresses the inherent bias in existing tools which discard gene-ambiguous reads and improves the accuracy of gene abundance estimates. Alevin is considerably faster, typically eight times, than existing gene quantification approaches, while also using less memory.
我们介绍了一种名为 alevin 的快速端到端流程,用于处理基于液滴的单细胞 RNA 测序数据,执行细胞条码检测、读段映射、唯一分子标识符 (UMI) 去重、基因计数估计和细胞条码筛选。alevin 的 UMI 去重方法考虑了分子可能起源的转录本约束,并考虑了基因特异的读段和在多个基因之间多映射的读段。这解决了现有工具丢弃基因模糊读段的固有偏差,并提高了基因丰度估计的准确性。alevin 通常比现有的基因定量方法快 8 倍,同时也使用更少的内存。