Paredes Roberto, Vidal Enrique
Departamento de Sistemas Informáticos y Computación, Instituto Tecnológico de Informática, Universidad Politiécnica de Valencia, Spain.
IEEE Trans Pattern Anal Mach Intell. 2006 Jul;28(7):1100-10. doi: 10.1109/TPAMI.2006.145.
In order to optimize the accuracy of the Nearest-Neighbor classification rule, a weighted distance is proposed, along with algorithms to automatically learn the corresponding weights. These weights may be specific for each class and feature, for each individual prototype, or for both. The learning algorithms are derived by (approximately) minimizing the Leaving-One-Out classification error of the given training set. The proposed approach is assessed through a series of experiments with UCI/STATLOG corpora, as well as with a more specific task of text classification which entails very sparse data representation and huge dimensionality. In all these experiments, the proposed approach shows a uniformly good behavior, with results comparable to or better than state-of-the-art results published with the same data so far.
为了优化最近邻分类规则的准确性,提出了一种加权距离,以及自动学习相应权重的算法。这些权重可以针对每个类别和特征、每个单独的原型或两者都是特定的。学习算法是通过(近似)最小化给定训练集的留一法分类误差而推导出来的。通过对UCI/STATLOG语料库进行一系列实验,以及对一项需要非常稀疏的数据表示和巨大维度的更具体的文本分类任务进行评估,来评估所提出的方法。在所有这些实验中,所提出的方法都表现出一致的良好性能,其结果与迄今为止使用相同数据发表的最新结果相当或更好。