Medical Scientist Training Program, University of California, San Francisco, San Francisco, CA, USA; UC Berkeley-UCSF Graduate Program in Bioengineering, San Francisco, CA, USA; Institute for Human Genetics, University of California, San Francisco, San Francisco, CA, USA.
Institute for Human Genetics, University of California, San Francisco, San Francisco, CA, USA.
Cell. 2024 Oct 31;187(22):6393-6410.e16. doi: 10.1016/j.cell.2024.09.044. Epub 2024 Oct 24.
Differential expression analysis of single-cell RNA sequencing (scRNA-seq) data is central for characterizing how experimental factors affect the distribution of gene expression. However, distinguishing between biological and technical sources of cell-cell variability and assessing the statistical significance of quantitative comparisons between cell groups remain challenging. We introduce Memento, a tool for robust and efficient differential analysis of mean expression, variability, and gene correlation from scRNA-seq data, scalable to millions of cells and thousands of samples. We applied Memento to 70,000 tracheal epithelial cells to identify interferon-responsive genes, 160,000 CRISPR-Cas9 perturbed T cells to reconstruct gene-regulatory networks, 1.2 million peripheral blood mononuclear cells (PBMCs) to map cell-type-specific quantitative trait loci (QTLs), and the 50-million-cell CELLxGENE Discover corpus to compare arbitrary cell groups. In all cases, Memento identified more significant and reproducible differences in mean expression compared with existing methods. It also identified differences in variability and gene correlation that suggest distinct transcriptional regulation mechanisms imparted by perturbations.
单细胞 RNA 测序 (scRNA-seq) 数据的差异表达分析对于描述实验因素如何影响基因表达分布至关重要。然而,区分细胞间变异性的生物学和技术来源,以及评估细胞群之间定量比较的统计学意义仍然具有挑战性。我们引入了 Memento,这是一种用于从 scRNA-seq 数据中稳健高效地分析平均表达、变异性和基因相关性的差异分析工具,可扩展到数百万个细胞和数千个样本。我们将 Memento 应用于 70000 个气管上皮细胞,以鉴定干扰素反应基因;将其应用于 160000 个 CRISPR-Cas9 扰动 T 细胞,以重建基因调控网络;将其应用于 120 万个外周血单核细胞 (PBMC),以绘制细胞类型特异性数量性状基因座 (QTL);并将 CELLxGENE Discover 语料库中的 5000 万个细胞用于比较任意细胞群。在所有情况下,与现有方法相比,Memento 都能更准确地识别出差异表达的基因,且具有更高的重现性。它还识别出了变异性和基因相关性的差异,这表明扰动赋予了不同的转录调控机制。