Zhang Kai, Zemke Nathan R, Armand Ethan J, Ren Bing
bioRxiv. 2023 Sep 15:2023.09.11.557221. doi: 10.1101/2023.09.11.557221.
Single-cell omics technologies have ushered in a new era for the study of dynamic gene regulation in complex tissues during development and disease pathogenesis. A major computational challenge in analyzing these datasets is to project the large-scale and high dimensional data into low-dimensional space while retaining the relative relationships between cells in order to decompose the cellular heterogeneity and reconstruct cell-type-specific gene regulatory programs. Conventional dimensionality reduction methods suffer from computational inefficiency, difficulty to capture the full spectrum of cellular heterogeneity, or inability to apply across diverse molecular modalities. Here, we report a fast and nonlinear dimensionality reduction algorithm that not only more accurately captures the heterogeneities of single-cell omics data, but also features runtime and memory usage that is computational efficient and linearly proportional to cell numbers. We implement this algorithm in a Python package named SnapATAC2, and demonstrate its superior performance, remarkable scalability and general adaptability using an array of single-cell omics data types, including single-cell ATAC-seq, single-cell RNA-seq, single-cell Hi-C, and single-cell multiomics datasets.
单细胞组学技术开启了一个研究发育和疾病发病机制过程中复杂组织动态基因调控的新时代。分析这些数据集的一个主要计算挑战是将大规模高维数据投影到低维空间,同时保留细胞之间的相对关系,以便分解细胞异质性并重建细胞类型特异性基因调控程序。传统的降维方法存在计算效率低下、难以捕捉细胞异质性的全貌或无法跨多种分子模式应用的问题。在这里,我们报告了一种快速非线性降维算法,该算法不仅能更准确地捕捉单细胞组学数据的异质性,而且具有运行时和内存使用效率高且与细胞数量成线性比例的特点。我们在一个名为SnapATAC2的Python包中实现了该算法,并使用一系列单细胞组学数据类型(包括单细胞ATAC测序、单细胞RNA测序、单细胞Hi-C和单细胞多组学数据集)展示了其卓越的性能、显著的可扩展性和普遍的适应性。