Dessì Nicoletta, Pascariello Emanuele, Pes Barbara
Dipartimento di Matematica e Informatica, Università degli Studi di Cagliari, Via Ospedale 72, 09124 Cagliari, Italy.
Biomed Res Int. 2013;2013:387673. doi: 10.1155/2013/387673. Epub 2013 Nov 10.
Feature selection has become the essential step in biomarker discovery from high-dimensional genomics data. It is recognized that different feature selection techniques may result in different set of biomarkers, that is, different groups of genes highly correlated to a given pathological condition, but few direct comparisons exist which quantify these differences in a systematic way. In this paper, we propose a general methodology for comparing the outcomes of different selection techniques in the context of biomarker discovery. The comparison is carried out along two dimensions: (i) measuring the similarity/dissimilarity of selected gene sets; (ii) evaluating the implications of these differences in terms of both predictive performance and stability of selected gene sets. As a case study, we considered three benchmarks deriving from DNA microarray experiments and conducted a comparative analysis among eight selection methods, representatives of different classes of feature selection techniques. Our results show that the proposed approach can provide useful insight about the pattern of agreement of biomarker discovery techniques.
特征选择已成为从高维基因组数据中发现生物标志物的关键步骤。人们认识到,不同的特征选择技术可能会导致不同的生物标志物集,即与特定病理状况高度相关的不同基因组合,但很少有直接比较能够系统地量化这些差异。在本文中,我们提出了一种通用方法,用于在生物标志物发现的背景下比较不同选择技术的结果。这种比较沿着两个维度进行:(i)测量所选基因集的相似性/不相似性;(ii)从所选基因集的预测性能和稳定性两方面评估这些差异的影响。作为一个案例研究,我们考虑了来自DNA微阵列实验的三个基准,并对代表不同类别的特征选择技术的八种选择方法进行了比较分析。我们的结果表明,所提出的方法可以为生物标志物发现技术的一致性模式提供有用的见解。