用于微阵列分类的改进基因选择

Improved gene selection for classification of microarrays.

作者信息

Jaeger J, Sengupta R, Ruzzo W L

机构信息

Department of Computer Science & Engineering, University of Washington, 114 Sieg Hall, Box 352350, Seattle, WA 98195, USA.

出版信息

Pac Symp Biocomput. 2003:53-64. doi: 10.1142/9789812776303_0006.

DOI:10.1142/9789812776303_0006

PMID:12603017

Abstract

In this paper we derive a method for evaluating and improving techniques for selecting informative genes from microarray data. Genes of interest are typically selected by ranking genes according to a test-statistic and then choosing the top k genes. A problem with this approach is that many of these genes are highly correlated. For classification purposes it would be ideal to have distinct but still highly informative genes. We propose three different pre-filter methods--two based on clustering and one based on correlation--to retrieve groups of similar genes. For these groups we apply a test-statistic to finally select genes of interest. We show that this filtered set of genes can be used to significantly improve existing classifiers.

摘要

在本文中，我们推导了一种用于评估和改进从微阵列数据中选择信息基因的技术的方法。通常通过根据检验统计量对基因进行排名，然后选择排名靠前的k个基因来选择感兴趣的基因。这种方法的一个问题是，其中许多基因高度相关。出于分类目的，拥有不同但仍然高度信息丰富的基因将是理想的。我们提出了三种不同的预过滤方法——两种基于聚类，一种基于相关性——以检索相似基因的组。对于这些组，我们应用检验统计量最终选择感兴趣的基因。我们表明，这种经过过滤的基因集可用于显著改进现有分类器。