IEEE Trans Neural Netw Learn Syst. 2017 Dec;28(12):2985-2997. doi: 10.1109/TNNLS.2016.2609466. Epub 2016 Sep 30.
Classification algorithms have been traditionally designed to simultaneously reduce errors caused by bias as well by variance. However, there occur many situations where low generalization error becomes extremely crucial to getting tangible classification solutions, and even slight overfitting causes serious consequences in the test results. In such situations, classifiers with low Vapnik-Chervonenkis (VC) dimension can bring out positive differences due to two main advantages: 1) the classifier manages to keep the test error close to training error and 2) the classifier learns effectively with small number of samples. This paper shows that a class of classifiers named majority vote point (MVP) classifiers, on account of very low VC dimension, can exhibit a generalization error that is even lower than that of linear classifiers. This paper proceeds by theoretically formulating an upper bound on the VC dimension of the MVP classifier. Later, through empirical analysis, the trend of exact values of VC dimension is estimated. Finally, case studies on machine fault diagnosis problems and prostate tumor detection problem revalidate the fact that an MVP classifier can achieve a lower generalization error than most other classifiers.
分类算法传统上被设计用来同时减少偏差和方差引起的错误。然而,在许多情况下,低泛化误差对于获得可实际应用的分类解决方案变得极其重要,甚至轻微的过拟合都会在测试结果中造成严重后果。在这种情况下,具有低 Vapnik-Chervonenkis(VC)维数的分类器由于两个主要优势可以带来积极的差异:1)分类器设法使测试误差接近训练误差,2)分类器可以有效地利用少量样本进行学习。本文表明,一类名为多数投票点(MVP)分类器的分类器,由于 VC 维数非常低,可以表现出比线性分类器更低的泛化误差。本文首先从理论上推导出 MVP 分类器的 VC 维数的上界。然后,通过实证分析,估计了 VC 维数的精确值趋势。最后,在机器故障诊断问题和前列腺肿瘤检测问题的案例研究中,验证了 MVP 分类器可以比大多数其他分类器实现更低的泛化误差这一事实。