Department of Mathematics, Myongji University, Kyonggi 449-728, Korea.
Stat Med. 2011 Jul 20;30(16):2028-39. doi: 10.1002/sim.4235. Epub 2011 Apr 7.
A gene set in DNA microarrays is a group of genes that share a common biological function, chromosomal location, or regulation. This paper discusses the problem of jointly identifying multiple differentially expressed gene sets associated with a phenotype of interest from many hundreds of pre-defined gene sets in a microarray experiment. We propose a null hypothesis that any group of gene sets from the experiment is not differentially expressed. The hypothesis is applicable to a real microarray experiment, where only a fraction of gene sets examined in the experiment are differentially expressed. To test this hypothesis, we provide an algorithm called set association for tail strength (SATS). SATS assigns the tail-strength statistic (TS) to each gene set to measure differential expression that is related to the phenotype of interest, combines the statistics into an overall association measure of multiple gene sets by utilizing a set-association method, and then calculates the significance of the overall measure by conducting sample permutations. SATS performs a simultaneous significance test on several gene sets, while controlling the Type I error rate. As multiple gene sets work together toward the significance, SATS can capture correlations across gene sets that should be considered in assessing joint statistical significance.
在 DNA 微阵列中,基因集是一组具有共同生物学功能、染色体位置或调控的基因。本文讨论了从微阵列实验中数百个预先定义的基因集中联合识别与感兴趣表型相关的多个差异表达基因集的问题。我们提出了一个零假设,即实验中的任何一组基因集都没有差异表达。该假设适用于真实的微阵列实验,其中实验中只有一部分基因集存在差异表达。为了检验这个假设,我们提出了一种叫做集合关联的尾部强度检验(SATS)算法。SATS 为每个基因集分配尾部强度统计量(TS),以测量与感兴趣表型相关的差异表达,通过使用集合关联方法将统计量组合成多个基因集的总体关联度量,然后通过样本置换计算总体度量的显著性。SATS 对多个基因集进行同时的显著性检验,同时控制第一类错误率。由于多个基因集共同作用以达到显著性,SATS 可以捕获基因集之间的相关性,这些相关性在评估联合统计显著性时应予以考虑。