Dettling Marcel
Seminar für Statistik, ETH Zürich, CH-8092 Switzerland.
Bioinformatics. 2004 Dec 12;20(18):3583-93. doi: 10.1093/bioinformatics/bth447. Epub 2004 Oct 5.
Microarray experiments are expected to contribute significantly to the progress in cancer treatment by enabling a precise and early diagnosis. They create a need for class prediction tools, which can deal with a large number of highly correlated input variables, perform feature selection and provide class probability estimates that serve as a quantification of the predictive uncertainty. A very promising solution is to combine the two ensemble schemes bagging and boosting to a novel algorithm called BagBoosting.
When bagging is used as a module in boosting, the resulting classifier consistently improves the predictive performance and the probability estimates of both bagging and boosting on real and simulated gene expression data. This quasi-guaranteed improvement can be obtained by simply making a bigger computing effort. The advantageous predictive potential is also confirmed by comparing BagBoosting to several established class prediction tools for microarray data.
Software for the modified boosting algorithms, for benchmark studies and for the simulation of microarray data are available as an R package under GNU public license at http://stat.ethz.ch/~dettling/bagboost.html.
微阵列实验有望通过实现精确和早期诊断,为癌症治疗的进展做出重大贡献。它们催生了对类别预测工具的需求,这类工具能够处理大量高度相关的输入变量,进行特征选择,并提供类别概率估计,以此作为预测不确定性的量化指标。一个非常有前景的解决方案是将两种集成方法装袋法(bagging)和提升法(boosting)结合成一种名为BagBoosting的新算法。
当把装袋法用作提升法中的一个模块时,所得分类器在真实和模拟基因表达数据上持续提高了装袋法和提升法的预测性能及概率估计。只需加大计算量就能实现这种几乎有保证的改进。通过将BagBoosting与几种用于微阵列数据的既定类别预测工具进行比较,也证实了其有利的预测潜力。
用于修改后的提升算法、基准研究以及微阵列数据模拟的软件,以R包的形式在GNU公共许可下可从http://stat.ethz.ch/~dettling/bagboost.html获取。