Division of Molecular Hematology, Lund Stem Cell Center, Lund University, Lund, Sweden.
Institut Polytechnique de Paris, Paris, France.
Nat Commun. 2022 Aug 8;13(1):4616. doi: 10.1038/s41467-022-32097-3.
As the scale of single-cell genomics experiments grows into the millions, the computational requirements to process this data are beyond the reach of many. Herein we present Scarf, a modularly designed Python package that seamlessly interoperates with other single-cell toolkits and allows for memory-efficient single-cell analysis of millions of cells on a laptop or low-cost devices like single-board computers. We demonstrate Scarf's memory and compute-time efficiency by applying it to the largest existing single-cell RNA-Seq and ATAC-Seq datasets. Scarf wraps memory-efficient implementations of a graph-based t-stochastic neighbour embedding and hierarchical clustering algorithm. Moreover, Scarf performs accurate reference-anchored mapping of datasets while maintaining memory efficiency. By implementing a subsampling algorithm, Scarf additionally has the capacity to generate representative sampling of cells from a given dataset wherein rare cell populations and lineage differentiation trajectories are conserved. Together, Scarf provides a framework wherein any researcher can perform advanced processing, subsampling, reanalysis, and integration of atlas-scale datasets on standard laptop computers. Scarf is available on Github: https://github.com/parashardhapola/scarf .
随着单细胞基因组学实验的规模增长到数百万,处理这些数据的计算要求超出了许多人的能力范围。在这里,我们介绍了 Scarf,这是一个模块化设计的 Python 包,它可以与其他单细胞工具无缝交互,并允许在笔记本电脑或低成本设备(如单板计算机)上对数百万个细胞进行内存高效的单细胞分析。我们通过将 Scarf 应用于最大的现有单细胞 RNA-Seq 和 ATAC-Seq 数据集来展示 Scarf 的内存和计算效率。Scarf 包装了基于图的 t-随机邻居嵌入和层次聚类算法的内存高效实现。此外,Scarf 在保持内存效率的同时执行准确的参考锚定数据集映射。通过实现抽样算法,Scarf 还可以从给定的数据集生成细胞的代表性抽样,其中保留稀有细胞群体和谱系分化轨迹。总之,Scarf 提供了一个框架,任何研究人员都可以在标准笔记本电脑上对图谱规模的数据集进行高级处理、抽样、重新分析和集成。Scarf 可在 Github 上获得:https://github.com/parashardhapola/scarf。