Huerta Edmundo Bonilla, Duval Béatrice, Hao Jin-Kao
LERIA, Université d'Angers, 2 Boulevard Lavoisier, 49045 Angers, France.
Genomics Proteomics Bioinformatics. 2008 Jun;6(2):61-73. doi: 10.1016/S1672-0229(08)60021-2.
Gene subset selection is essential for classification and analysis of microarray data. However, gene selection is known to be a very difficult task since gene expression data not only have high dimensionalities, but also contain redundant information and noises. To cope with these difficulties, this paper introduces a fuzzy logic based pre-processing approach composed of two main steps. First, we use fuzzy inference rules to transform the gene expression levels of a given dataset into fuzzy values. Then we apply a similarity relation to these fuzzy values to define fuzzy equivalence groups, each group containing strongly similar genes. Dimension reduction is achieved by considering for each group of similar genes a single representative based on mutual information. To assess the usefulness of this approach, extensive experimentations were carried out on three well-known public datasets with a combined classification model using three statistic filters and three classifiers.
基因子集选择对于微阵列数据的分类和分析至关重要。然而,由于基因表达数据不仅具有高维度,而且包含冗余信息和噪声,基因选择是一项非常困难的任务。为了应对这些困难,本文介绍了一种基于模糊逻辑的预处理方法,该方法由两个主要步骤组成。首先,我们使用模糊推理规则将给定数据集的基因表达水平转换为模糊值。然后,我们对这些模糊值应用相似关系来定义模糊等价组,每个组包含高度相似的基因。通过基于互信息为每组相似基因考虑一个单一代表来实现降维。为了评估该方法的有效性,使用三个统计滤波器和三个分类器的组合分类模型在三个著名的公共数据集上进行了广泛的实验。