Zimmerman Kip D, Langefeld Carl D
Center for Precision Medicine, Wake Forest School of Medicine, Winston-Salem, NC, USA.
Department of Biostatistics and Data Science, Wake Forest School of Medicine, Winston-Salem, NC, USA.
BMC Genomics. 2021 May 1;22(1):319. doi: 10.1186/s12864-021-07635-w.
Study design is a critical aspect of any experiment, and sample size calculations for statistical power that are consistent with that study design are central to robust and reproducible results. However, the existing power calculators for tests of differential expression in single-cell RNA-seq data focus on the total number of cells and not the number of independent experimental units, the true unit of interest for power. Thus, current methods grossly overestimate the power.
Hierarchicell is the first single-cell power calculator to explicitly simulate and account for the hierarchical correlation structure (i.e., within sample correlation) that exists in single-cell RNA-seq data. Hierarchicell, an R-package available on GitHub, estimates the within sample correlation structure from real data to simulate hierarchical single-cell RNA-seq data and estimate power for tests of differential expression. This multi-stage approach models gene dropout rates, intra-individual dispersion, inter-individual variation, variable or fixed number of cells per individual, and the correlation among cells within an individual. Without modeling the within sample correlation structure and without properly accounting for the correlation in downstream analysis, we demonstrate that estimates of power are falsely inflated. Hierarchicell can be used to estimate power for binary and continuous phenotypes based on user-specified number of independent experimental units (e.g., individuals) and cells within the experimental unit.
Hierarchicell is a user-friendly R-package that provides accurate estimates of power for testing hypotheses of differential expression in single-cell RNA-seq data. This R-package represents an important addition to single-cell RNA analytic tools and will help researchers design experiments with appropriate and accurate power, increasing discovery and improving robustness and reproducibility.
研究设计是任何实验的关键方面,与该研究设计一致的用于统计功效的样本量计算对于获得可靠且可重复的结果至关重要。然而,现有的用于单细胞RNA测序数据差异表达检验的功效计算器关注的是细胞总数,而非独立实验单位的数量,而独立实验单位才是功效真正感兴趣的单位。因此,当前方法严重高估了功效。
Hierarchicell是首个明确模拟并考虑单细胞RNA测序数据中存在的层次相关结构(即样本内相关性)的单细胞功效计算器。Hierarchicell是一个可在GitHub上获取的R包,它从实际数据估计样本内相关结构,以模拟层次单细胞RNA测序数据并估计差异表达检验的功效。这种多阶段方法对基因脱落率、个体内离散度、个体间变异、每个个体可变或固定的细胞数量以及个体内细胞间的相关性进行建模。我们证明,如果不模拟样本内相关结构且不在下游分析中正确考虑相关性,功效估计会被错误地夸大。Hierarchicell可用于根据用户指定的独立实验单位(如个体)数量和实验单位内的细胞数量,估计二元和连续表型的功效。
Hierarchicell是一个用户友好的R包,可提供用于检验单细胞RNA测序数据中差异表达假设的准确功效估计。这个R包是单细胞RNA分析工具的重要补充,将帮助研究人员设计具有适当且准确功效的实验,增加发现并提高稳健性和可重复性。