Liu Meizhu, Vemuri Baba C
Department of CISE, University of Florida, Gainesville, FL 32611.
Proc IEEE Int Symp Biomed Imaging. 2011 Mar 30;2011:1831-1834. doi: 10.1109/ISBI.2011.5872763.
Boosting is a versatile machine learning technique that has numerous applications including but not limited to image processing, computer vision, data mining etc. It is based on the premise that the classification performance of a set of weak learners can be boosted by some weighted combination of them. There have been a number of boosting methods proposed in the literature, such as the AdaBoost, LPBoost, SoftBoost and their variations. However, the learning update strategies used in these methods usually lead to overfitting and instabilities in the classification accuracy. Improved boosting methods via regularization can overcome such difficulties. In this paper, we propose a Riemannian distance regularized LPBoost, dubbed RBoost. RBoost uses Riemannian distance between two square-root densities (in closed form) - used to represent the distribution over the training data and the classification error respectively - to regularize the error distribution in an iterative update formula. Since this distance is in closed form, RBoost requires much less computational cost compared to other regularized Boosting algorithms. We present several experimental results depicting the performance of our algorithm in comparison to recently published methods, LP-Boost and CAVIAR, on a variety of datasets including the publicly available OASIS database, a home grown Epilepsy database and the well known UCI repository. Results depict that the RBoost algorithm performs better than the competing methods in terms of accuracy and efficiency.
提升算法是一种通用的机器学习技术,有众多应用,包括但不限于图像处理、计算机视觉、数据挖掘等。它基于这样一个前提:一组弱学习器的分类性能可以通过它们的某种加权组合得到提升。文献中已经提出了许多提升方法,如AdaBoost、LPBoost、SoftBoost及其变体。然而,这些方法中使用的学习更新策略通常会导致过拟合以及分类准确率的不稳定。通过正则化改进的提升方法可以克服这些困难。在本文中,我们提出了一种黎曼距离正则化的LPBoost,称为RBoost。RBoost在一个迭代更新公式中使用两个平方根密度(以封闭形式表示)之间的黎曼距离——分别用于表示训练数据上的分布和分类误差——来正则化误差分布。由于这个距离是封闭形式的,与其他正则化提升算法相比,RBoost所需的计算成本要少得多。我们展示了几个实验结果,描述了我们的算法与最近发表的方法LP - Boost和CAVIAR相比,在包括公开可用的OASIS数据库、一个自主构建的癫痫数据库以及著名的UCI库在内的各种数据集上的性能。结果表明,RBoost算法在准确性和效率方面比竞争方法表现更好。