Department of Statistics, Purdue University, West Lafayette, Indiana 47907, USA.
Genetics. 2010 Jun;185(2):405-16. doi: 10.1534/genetics.110.114983. Epub 2010 May 3.
Next-generation sequencing technologies are quickly becoming the preferred approach for characterizing and quantifying entire genomes. Even though data produced from these technologies are proving to be the most informative of any thus far, very little attention has been paid to fundamental design aspects of data collection and analysis, namely sampling, randomization, replication, and blocking. We discuss these concepts in an RNA sequencing framework. Using simulations we demonstrate the benefits of collecting replicated RNA sequencing data according to well known statistical designs that partition the sources of biological and technical variation. Examples of these designs and their corresponding models are presented with the goal of testing differential expression.
下一代测序技术正迅速成为描述和量化整个基因组的首选方法。尽管这些技术产生的数据被证明是迄今为止最具信息量的,但很少有人关注数据收集和分析的基本设计方面,即采样、随机化、复制和分组。我们在 RNA 测序框架中讨论这些概念。使用模拟,我们展示了根据将生物学和技术变异源分开的著名统计设计收集复制的 RNA 测序数据的好处。这些设计及其对应的模型的示例被提出,目的是测试差异表达。