Zhang Zhaojun, Mathew Divij, Lim Tristan L, Mason Kaishu, Martinez Clara Morral, Huang Sijia, Wherry E John, Susztak Katalin, Minn Andy J, Ma Zongming, Zhang Nancy R
Department of Statistics and Data Science, The Wharton School, University of Pennsylvania, Philadelphia, PA, USA.
Department of Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.
Nat Biotechnol. 2024 Nov 26. doi: 10.1038/s41587-024-02463-1.
Data integration to align cells across batches has become a cornerstone of single-cell data analysis, critically affecting downstream results. Currently, there are no guidelines for when the biological differences between samples are separable from batch effects. Here we show that current paradigms for single-cell data integration remove biologically meaningful variation and introduce distortion. We present a statistical model and computationally scalable algorithm, CellANOVA (cell state space analysis of variance), that harnesses experimental design to explicitly recover biological signals that are erased during single-cell data integration. CellANOVA uses a 'pool-of-controls' design concept, applicable across diverse settings, to separate unwanted variation from biological variation of interest and allow the recovery of subtle biological signals. We apply CellANOVA to diverse contexts and validate the recovered biological signals by orthogonal assays. In particular, we show that CellANOVA is effective in the challenging case of single-cell and single-nucleus data integration, where it recovers subtle biological signals that can be validated and replicated by external data.
跨批次对齐细胞的数据整合已成为单细胞数据分析的基石,严重影响下游结果。目前,对于样本之间的生物学差异何时可与批次效应区分开来,尚无指导原则。在这里,我们表明,当前的单细胞数据整合范式会消除生物学上有意义的变异并引入失真。我们提出了一种统计模型和计算上可扩展的算法,即CellANOVA(细胞状态空间方差分析),它利用实验设计来明确恢复在单细胞数据整合过程中被消除的生物学信号。CellANOVA使用一种“对照池”设计概念,适用于各种情况,以将不需要的变异与感兴趣的生物学变异分开,并允许恢复微妙的生物学信号。我们将CellANOVA应用于各种情况,并通过正交分析验证恢复的生物学信号。特别是,我们表明CellANOVA在单细胞和单细胞核数据整合的具有挑战性的情况下是有效的,在这种情况下,它可以恢复可以通过外部数据进行验证和复制的微妙生物学信号。