Mukherjee Sach, Roberts Stephen J
Department of Engineering Science, University of Oxford, UK.
Proc IEEE Comput Syst Bioinform Conf. 2004:131-41.
A great deal of recent research has focused on the challenging task of selecting differentially expressed genes from microarray data ('gene selection'). Numerous gene selection algorithms have been proposed in the literature, but it is often unclear exactly how these algorithms respond to conditions like small sample-sizes or differing variances. Choosing an appropriate algorithm can therefore be difficult in many cases. In this paper we propose a theoretical analysis of gene selection, in which the probability of successfully selecting relevant genes, using a given gene ranking function, is explicitly calculated in terms of population parameters. The theory developed is applicable to any ranking function which has a known sampling distribution, or one which can be approximated analytically. In contrast to empirical methods, the analysis can easily be used to examine the behaviour of gene selection algorithms under a wide variety of conditions, even when the numbers of genes involved runs into the tens of thousands. The utility of our approach is illustrated by comparing three well-known gene ranking functions.
近期大量研究聚焦于从微阵列数据中选择差异表达基因这一具有挑战性的任务(“基因选择”)。文献中已提出众多基因选择算法,但通常并不清楚这些算法在小样本量或不同方差等条件下的确切响应方式。因此在许多情况下,选择合适的算法可能会很困难。在本文中,我们提出了一种基因选择的理论分析方法,其中使用给定的基因排名函数成功选择相关基因的概率是根据总体参数明确计算得出的。所发展的理论适用于任何具有已知抽样分布或可通过解析近似的排名函数。与实证方法不同,该分析可轻松用于检验基因选择算法在各种条件下的行为,即使涉及的基因数量达到数万。通过比较三种著名的基因排名函数说明了我们方法的实用性。