Barrett Andrea B, Phan John H, Wang May D
Department of Biomedical Engineering at the Georgia Institute of Technology, Atlanta, 30318 USA.
Annu Int Conf IEEE Eng Med Biol Soc. 2008;2008:5660-3. doi: 10.1109/IEMBS.2008.4650498.
Microarray technology has enabled us to simultaneously measure the expression of thousands of genes. Using this high-throughput data collection, we can examine subtle genetic changes between biological samples and build predictive models for clinical applications. Although microarrays have dramatically increased the rate of data collection, sample size is still a major issue in feature selection. Previous methods show that microarray data combination is successful in improving selection when using z-scores and fold change. We propose a wrapper based gene selection technique that combines bootstrap estimated classification errors for individual genes across multiple datasets. The bootstrap is an unbiased estimator of classification error and has been shown to be effective for small sample data. Coupled with data combination across multiple data sets, we show that this meta-analytic approach improves gene selection.
微阵列技术使我们能够同时测量数千个基因的表达。利用这种高通量数据收集方法,我们可以检测生物样本之间细微的基因变化,并构建用于临床应用的预测模型。尽管微阵列极大地提高了数据收集的速度,但样本量仍是特征选择中的一个主要问题。先前的方法表明,在使用z分数和倍数变化时,微阵列数据组合在改善选择方面是成功的。我们提出了一种基于包装法的基因选择技术,该技术结合了多个数据集中单个基因的自助估计分类误差。自助法是分类误差的无偏估计,已被证明对小样本数据有效。结合多个数据集的数据组合,我们表明这种元分析方法改进了基因选择。