He Dongze, Zakeri Mohsen, Sarkar Hirak, Soneson Charlotte, Srivastava Avi, Patro Rob
Department of Cell Biology and Molecular Genetics and Center for Bioinformatics and Computational Biology, University of Maryland, College Park, MD, USA.
Department of Computer Science and Center for Bioinformatics and Computational Biology, University of Maryland, College Park, MD, USA.
Nat Methods. 2022 Mar;19(3):316-322. doi: 10.1038/s41592-022-01408-3. Epub 2022 Mar 11.
The rapid growth of high-throughput single-cell and single-nucleus RNA-sequencing (scRNA-seq and snRNA-seq) technologies has produced a wealth of data over the past few years. The size, volume and distinctive characteristics of these data necessitate the development of new computational methods to accurately and efficiently quantify sc/snRNA-seq data into count matrices that constitute the input to downstream analyses. We introduce the alevin-fry framework for quantifying sc/snRNA-seq data. In addition to being faster and more memory frugal than other accurate quantification approaches, alevin-fry ameliorates the memory scalability and false-positive expression issues that are exhibited by other lightweight tools. We demonstrate how alevin-fry can be effectively used to quantify sc/snRNA-seq data, and also how the spliced and unspliced molecule quantification required as input for RNA velocity analyses can be seamlessly extracted from the same preprocessed data used to generate normal gene expression count matrices.
在过去几年中,高通量单细胞和单细胞核RNA测序(scRNA-seq和snRNA-seq)技术的迅速发展产生了大量数据。这些数据的规模、体量和独特特征使得有必要开发新的计算方法,以便准确、高效地将sc/snRNA-seq数据量化为计数矩阵,这些矩阵构成了下游分析的输入。我们介绍了用于量化sc/snRNA-seq数据的alevin-fry框架。除了比其他精确量化方法更快且更节省内存外,alevin-fry还改善了其他轻量级工具所表现出的内存可扩展性和假阳性表达问题。我们展示了alevin-fry如何有效地用于量化sc/snRNA-seq数据,以及如何从用于生成正常基因表达计数矩阵的相同预处理数据中无缝提取RNA速度分析所需的剪接和未剪接分子量化结果。