Department of Mathematics and Statistics, Utah State University, Logan, UT, USA.
Division of Epidemiology, Department of Internal Medicine, University of Utah, Salt Lake City, UT, USA.
BMC Genomics. 2018 Dec 20;19(1):953. doi: 10.1186/s12864-018-5236-2.
When genomics researchers design a high-throughput study to test for differential expression, some biological systems and research questions provide opportunities to use paired samples from subjects, and researchers can plan for a certain proportion of subjects to have paired samples. We consider the effect of this paired samples proportion on the statistical power of the study, using characteristics of both count (RNA-Seq) and continuous (microarray) expression data from a colorectal cancer study.
We demonstrate that a higher proportion of subjects with paired samples yields higher statistical power, for various total numbers of samples, and for various strengths of subject-level confounding factors. In the design scenarios considered, the statistical power in a fully-paired design is substantially (and in many cases several times) greater than in an unpaired design.
For the many biological systems and research questions where paired samples are feasible and relevant, substantial statistical power gains can be achieved at the study design stage when genomics researchers plan on using paired samples from the largest possible proportion of subjects. Any cost savings in a study design with unpaired samples are likely accompanied by underpowered and possibly biased results.
当基因组学研究人员设计高通量研究来测试差异表达时,一些生物系统和研究问题提供了使用来自受试者的配对样本的机会,并且研究人员可以计划让一定比例的受试者具有配对样本。我们考虑了这种配对样本比例对研究统计功效的影响,使用了来自结直肠癌研究的计数(RNA-Seq)和连续(微阵列)表达数据的特征。
我们证明,对于各种总样本数量和各种受试者水平混杂因素的强度,具有更高比例配对样本的受试者具有更高的统计功效。在所考虑的设计方案中,完全配对设计的统计功效大大(在许多情况下是几次)高于非配对设计。
对于许多具有可行性和相关性的生物系统和研究问题,当基因组学研究人员计划从尽可能大的受试者比例中使用配对样本时,在研究设计阶段可以获得大量的统计功效增益。在没有配对样本的研究设计中节省的任何成本都可能伴随着统计功效不足和可能存在偏倚的结果。