Lun Aaron T L, Marioni John C
Cancer Research UK Cambridge Institute, University of Cambridge, Li Ka Shing Centre, RobinsonWay, Cambridge CB2 0RE, UK.
Cancer Research UK Cambridge Institute, University of Cambridge, Li Ka Shing Centre, Robinson Way, Cambridge CB2 0RE, UK.
Biostatistics. 2017 Jul 1;18(3):451-464. doi: 10.1093/biostatistics/kxw055.
An increasing number of studies are using single-cell RNA-sequencing (scRNA-seq) to characterize the gene expression profiles of individual cells. One common analysis applied to scRNA-seq data involves detecting differentially expressed (DE) genes between cells in different biological groups. However, many experiments are designed such that the cells to be compared are processed in separate plates or chips, meaning that the groupings are confounded with systematic plate effects. This confounding aspect is frequently ignored in DE analyses of scRNA-seq data. In this article, we demonstrate that failing to consider plate effects in the statistical model results in loss of type I error control. A solution is proposed whereby counts are summed from all cells in each plate and the count sums for all plates are used in the DE analysis. This restores type I error control in the presence of plate effects without compromising detection power in simulated data. Summation is also robust to varying numbers and library sizes of cells on each plate. Similar results are observed in DE analyses of real data where the use of count sums instead of single-cell counts improves specificity and the ranking of relevant genes. This suggests that summation can assist in maintaining statistical rigour in DE analyses of scRNA-seq data with plate effects.
越来越多的研究使用单细胞RNA测序(scRNA-seq)来表征单个细胞的基因表达谱。应用于scRNA-seq数据的一种常见分析方法是检测不同生物学组中细胞之间的差异表达(DE)基因。然而,许多实验的设计使得要比较的细胞在单独的平板或芯片中进行处理,这意味着分组与系统的平板效应相互混淆。在scRNA-seq数据的DE分析中,这个混淆因素经常被忽略。在本文中,我们证明在统计模型中不考虑平板效应会导致I型错误控制的丧失。我们提出了一种解决方案,即对每个平板中的所有细胞计数求和,并将所有平板的计数总和用于DE分析。这在存在平板效应的情况下恢复了I型错误控制,同时在模拟数据中不影响检测能力。求和对每个平板上不同数量和文库大小的细胞也具有稳健性。在对真实数据的DE分析中也观察到了类似的结果,其中使用计数总和而不是单细胞计数提高了特异性以及相关基因的排名。这表明求和有助于在存在平板效应的scRNA-seq数据的DE分析中保持统计严谨性。