Shen Chunhua, Li Hanxi
NICTA, Canberra Research Laboratory, Canberra, Australia.
IEEE Trans Neural Netw. 2010 Apr;21(4):659-66. doi: 10.1109/TNN.2010.2040484. Epub 2010 Feb 17.
Boosting has been of great interest recently in the machine learning community because of the impressive performance for classification and regression problems. The success of boosting algorithms may be interpreted in terms of the margin theory. Recently, it has been shown that generalization error of classifiers can be obtained by explicitly taking the margin distribution of the training data into account. Most of the current boosting algorithms in practice usually optimize a convex loss function and do not make use of the margin distribution. In this brief, we design a new boosting algorithm, termed margin-distribution boosting (MDBoost), which directly maximizes the average margin and minimizes the margin variance at the same time. This way the margin distribution is optimized. A totally corrective optimization algorithm based on column generation is proposed to implement MDBoost. Experiments on various data sets show that MDBoost outperforms AdaBoost and LPBoost in most cases.
由于在分类和回归问题上表现出色,提升算法最近在机器学习社区引起了极大关注。提升算法的成功可以根据间隔理论来解释。最近,已经表明通过明确考虑训练数据的间隔分布可以获得分类器的泛化误差。目前大多数实际应用中的提升算法通常优化一个凸损失函数,而没有利用间隔分布。在本简报中,我们设计了一种新的提升算法,称为间隔分布提升(MDBoost),它同时直接最大化平均间隔并最小化间隔方差。通过这种方式优化了间隔分布。提出了一种基于列生成的完全校正优化算法来实现MDBoost。在各种数据集上的实验表明,在大多数情况下,MDBoost优于AdaBoost和LPBoost。