Xu Ronghui, Li Xiaochun
Department of Biostatistics, Harvard School of Public Health and Dana-Farber Cancer Institute, 44 Binney Street, Boston, MA 02115, USA.
Bioinformatics. 2003 Jul 1;19(10):1284-9. doi: 10.1093/bioinformatics/btg155.
In analyses of microarray data with a design of different biological conditions, ranking genes by their differential 'importance' is often desired so that biologists can focus research on a small subset of genes that are most likely related to the experiment conditions. Permutation methods are often recommended and used, in place of their parametric counterparts, due to the small sample sizes of microarray experiments and possible non-normality of the data. The recommendations, however, are based on classical knowledge in the hypothesis test setting.
We explore the relationship between hypothesis testing and gene ranking. We indicate that the permutation method does not provide a metric for the distance between two underlying distributions. In our simulation studies permutation methods tend to be equally or less accurate than parametric methods in ranking genes. This is partially due to the discreteness of the permutation distributions, as well as the non-metric property. In data analysis the variability in ranking genes can be assessed by bootstrap. It turns out that the variability is much lower for permutation than parametric methods, which agrees with the known robustness of permutation methods to individual outliers in the data.
在对具有不同生物学条件设计的微阵列数据进行分析时,通常希望根据基因的差异“重要性”对其进行排名,以便生物学家能够将研究重点放在最有可能与实验条件相关的一小部分基因上。由于微阵列实验的样本量较小且数据可能不呈正态分布,因此通常推荐并使用置换方法来代替参数方法。然而,这些建议是基于假设检验设置中的经典知识。
我们探讨了假设检验与基因排名之间的关系。我们指出,置换方法没有提供两个潜在分布之间距离的度量。在我们的模拟研究中,置换方法在对基因进行排名时往往比参数方法的准确性相同或更低。这部分是由于置换分布的离散性以及非度量性质。在数据分析中,可以通过自助法评估基因排名的变异性。结果表明,置换方法的变异性比参数方法低得多,这与置换方法对数据中单个异常值的已知稳健性一致。