Department of Statistics, Brigham Young University, Provo, UT 84602, USA.
Biostatistics. 2010 Jul;11(3):533-6. doi: 10.1093/biostatistics/kxq010. Epub 2010 Feb 24.
The biological complexity of gene expression makes simulation of gene expression data difficult. We propose a spike-in simulation that adds a single simulated gene to the data set of interest. Features of this spike-in gene may be manipulated to observe how often the spiked-in gene appears in the list of differentially expressed genes. This approach provides insight into the data analysis method, the observed data, and the manner in which the method and data interact without relying on indefensible assumptions regarding gene coexpression.
基因表达的生物学复杂性使得基因表达数据的模拟变得困难。我们提出了一种 Spike-in 模拟方法,该方法向感兴趣的数据集添加单个模拟基因。可以操纵这个 Spike-in 基因的特征,以观察该 Spike-in 基因出现在差异表达基因列表中的频率。这种方法提供了对数据分析方法、观察到的数据以及方法和数据相互作用方式的深入了解,而无需依赖于关于基因共表达的不合理假设。