Peng Yanxiong, Li Wenyuan, Liu Ying
Laboratory for Bioinformatics and Medical Informatics, University of Texas at Dallas, Richardson, TX 75083-0688, USA.
Cancer Inform. 2007 Feb 22;2:301-11.
Microarrays allow researchers to monitor the gene expression patterns for tens of thousands of genes across a wide range of cellular responses, phenotype and conditions. Selecting a small subset of discriminate genes from thousands of genes is important for accurate classification of diseases and phenotypes. Many methods have been proposed to find subsets of genes with maximum relevance and minimum redundancy, which can distinguish accurately between samples with different labels. To find the minimum subset of relevant genes is often referred as biomarker discovery. Two main approaches, filter and wrapper techniques, have been applied to biomarker discovery. In this paper, we conducted a comparative study of different biomarker discovery methods, including six filter methods and three wrapper methods. We then proposed a hybrid approach, FR-Wrapper, for biomarker discovery. The aim of this approach is to find an optimum balance between the precision of the biomarker discovery and the computation cost, by taking advantages of both filter method's efficiency and wrapper method's high accuracy. Our hybrid approach applies Fisher's ratio, a simple method easy to understand and implement, to filter out most of the irrelevant genes, then a wrapper method is employed to reduce the redundancy. The performance of FR-Wrapper approach is evaluated over four widely used microarray datasets. Analysis of experimental results reveals that the hybrid approach can achieve the goal of maximum relevance with minimum redundancy.
微阵列技术使研究人员能够监测数万个基因在广泛的细胞反应、表型和条件下的基因表达模式。从数千个基因中选择一小部分具有区分性的基因对于疾病和表型的准确分类至关重要。已经提出了许多方法来寻找具有最大相关性和最小冗余性的基因子集,这些子集能够准确区分具有不同标签的样本。寻找相关基因的最小子集通常被称为生物标志物发现。两种主要方法,即过滤法和包装法,已被应用于生物标志物发现。在本文中,我们对不同的生物标志物发现方法进行了比较研究,包括六种过滤法和三种包装法。然后,我们提出了一种用于生物标志物发现的混合方法FR-Wrapper。这种方法的目的是通过利用过滤法的效率和包装法的高精度,在生物标志物发现的精度和计算成本之间找到最佳平衡。我们的混合方法应用费舍尔比率(一种易于理解和实现的简单方法)来过滤掉大多数不相关的基因,然后采用包装法来减少冗余。在四个广泛使用的微阵列数据集上评估了FR-Wrapper方法的性能。实验结果分析表明,该混合方法能够实现最大相关性和最小冗余性的目标。