Wu Chih-Hsuan, Zhou Xiang, Chen Mengjie
Department of Statistics, University of Chicago, Chicago, USA.
Department of Biostatistics, University of Michigan, Ann Arbor, USA.
Genome Biol. 2025 Mar 17;26(1):58. doi: 10.1186/s13059-025-03525-6.
Differential expression analysis is pivotal in single-cell transcriptomics for unraveling cell-type-specific responses to stimuli. While numerous methods are available to identify differentially expressed genes in single-cell data, recent evaluations of both single-cell-specific methods and methods adapted from bulk studies have revealed significant shortcomings in performance. In this paper, we dissect the four major challenges in single-cell differential expression analysis: excessive zeros, normalization, donor effects, and cumulative biases. These "curses" underscore the limitations and conceptual pitfalls in existing workflows.
To address the limitations of current single-cell differential expression analysis methods, we propose GLIMES, a statistical framework that leverages UMI counts and zero proportions within a generalized Poisson/Binomial mixed-effects model to account for batch effects and within-sample variation. We rigorously benchmarked GLIMES against six existing differential expression methods using three case studies and simulations across different experimental scenarios, including comparisons across cell types, tissue regions, and cell states. Our results demonstrate that GLIMES is more adaptable to diverse experimental designs in single-cell studies and effectively mitigates key shortcomings of current approaches, particularly those related to normalization procedures. By preserving biologically meaningful signals, GLIMES offers improved performance in detecting differentially expressed genes.
By using absolute RNA expression rather than relative abundance, GLIMES improves sensitivity, reduces false discoveries, and enhances biological interpretability. This paradigm shift challenges existing workflows and highlights the need for careful consideration of normalization strategies, ultimately paving the way for more accurate and robust single-cell transcriptomic analyses.
差异表达分析在单细胞转录组学中对于揭示细胞类型对刺激的特异性反应至关重要。虽然有许多方法可用于识别单细胞数据中的差异表达基因,但最近对单细胞特异性方法和从批量研究改编的方法的评估都揭示了性能上的重大缺陷。在本文中,我们剖析了单细胞差异表达分析中的四个主要挑战:过多的零值、归一化、供体效应和累积偏差。这些“诅咒”凸显了现有工作流程中的局限性和概念陷阱。
为了解决当前单细胞差异表达分析方法的局限性,我们提出了GLIMES,这是一个统计框架,它在广义泊松/二项式混合效应模型中利用UMI计数和零比例来考虑批次效应和样本内变异。我们使用三个案例研究和不同实验场景的模拟,将GLIMES与六种现有的差异表达方法进行了严格的基准测试,包括跨细胞类型、组织区域和细胞状态的比较。我们的结果表明,GLIMES更适应单细胞研究中的各种实验设计,并有效减轻了当前方法的关键缺点,特别是那些与归一化程序相关的缺点。通过保留生物学上有意义的信号,GLIMES在检测差异表达基因方面提供了更好的性能。
通过使用绝对RNA表达而不是相对丰度,GLIMES提高了灵敏度,减少了错误发现,并增强了生物学可解释性。这种范式转变挑战了现有工作流程,并突出了仔细考虑归一化策略的必要性,最终为更准确和稳健的单细胞转录组分析铺平了道路。