Department of Data Science, The Institute of Statistical Mathematics, Tokyo, Japan.
Stat Med. 2009 Sep 30;28(22):2801-16. doi: 10.1002/sim.3666.
The main role of high-throughput microarrays today is screening of relevant genes from a large pool of candidate genes. For prioritizing genes for subsequent studies, gene ranking based on the strength of the association with the phenotype is a relevant statistical output. In this article, we propose sample size calculations based on gene ranking and selection using the non-parametric Mann-Whitney-Wilcoxon statistic in microarray experiments. The use of the non-parametric statistic is expected to be advantageous in robustification in gene ranking for the deviation from normality and for possible scale change by using different platforms such as polymerase chain reaction-based platforms in subsequent studies in gene expression data. Application to the data set from a clinical study for lymphoma is given.
目前,高通量微阵列的主要作用是从大量候选基因中筛选相关基因。为了优先选择后续研究的基因,基于与表型关联强度的基因排序是一个相关的统计输出。在本文中,我们提出了基于基因排序和使用微阵列实验中非参数 Mann-Whitney-Wilcoxon 统计的选择的样本量计算。预计在基因排序中使用非参数统计对于稳健性是有利的,因为在后续的基因表达数据研究中,可能会出现偏离正态分布和可能的尺度变化,例如使用聚合酶链反应为基础的平台。本文给出了一个来自淋巴瘤临床研究数据集的应用。