Center for Biostatistics, Department of Biomedical Informatics, The Ohio State University, 1800 Cannon Dr., Columbus, 43210, OH, USA.
BMC Bioinformatics. 2020 May 19;21(1):198. doi: 10.1186/s12859-020-3541-7.
Power analysis becomes an inevitable step in experimental design of current biomedical research. Complex designs allowing diverse correlation structures are commonly used in RNA-Seq experiments. However, the field currently lacks statistical methods to calculate sample size and estimate power for RNA-Seq differential expression studies using such designs. To fill the gap, simulation based methods have a great advantage by providing numerical solutions, since theoretical distributions of test statistics are typically unavailable for such designs.
In this paper, we propose a novel simulation based procedure for power estimation of differential expression with the employment of generalized linear mixed effects models for correlated expression data. We also propose a new procedure for power estimation of differential expression with the use of a bivariate negative binomial distribution for paired designs. We compare the performance of both the likelihood ratio test and Wald test under a variety of simulation scenarios with the proposed procedures. The simulated distribution was used to estimate the null distribution of test statistics in order to achieve the desired false positive control and was compared to the asymptotic Chi-square distribution. In addition, we applied the procedure for paired designs to the TCGA breast cancer data set.
In summary, we provide a framework for power estimation of RNA-Seq differential expression under complex experimental designs. Simulation results demonstrate that both the proposed procedures properly control the false positive rate at the nominal level.
在当前的生物医学研究实验设计中,功效分析成为一个必不可少的步骤。在 RNA-Seq 实验中,通常会使用允许各种相关结构的复杂设计。然而,目前该领域缺乏用于此类设计的 RNA-Seq 差异表达研究的统计方法来计算样本量和估计功效。为了弥补这一空白,基于模拟的方法具有很大的优势,因为对于这种设计,测试统计量的理论分布通常是不可用的,所以可以提供数值解。
在本文中,我们提出了一种新的模拟方法,用于基于广义线性混合效应模型对相关表达数据进行差异表达功效估计。我们还提出了一种新的基于双变量负二项式分布的配对设计差异表达功效估计方法。我们比较了在不同模拟场景下,这两种方法(似然比检验和 Wald 检验)与所提出的方法的性能。模拟分布用于估计测试统计量的零分布,以实现所需的假阳性控制,并与渐近卡方分布进行了比较。此外,我们还将配对设计的方法应用于 TCGA 乳腺癌数据集。
总之,我们提供了一种在复杂实验设计下进行 RNA-Seq 差异表达功效估计的框架。模拟结果表明,这两种方法都能在名义水平上正确控制假阳性率。