Vidyarthi Ankit, Mittal Namita
Department of Computer science and Engineering, Malaviya National Institute of Technology Jaipur, Rajasthan 302017, India.
Comput Methods Programs Biomed. 2016 Dec;137:195-201. doi: 10.1016/j.cmpb.2016.08.015. Epub 2016 Sep 26.
In machine learning, the accuracy of the system depends upon classification result. Classification accuracy plays an imperative role in various domains. Non-parametric classifier like K-Nearest Neighbor (KNN) is the most widely used classifier for pattern analysis. Besides its easiness, simplicity and effectiveness characteristics, the main problem associated with KNN classifier is the selection of a number of nearest neighbors i.e. "k" for computation. At present, it is hard to find the optimal value of "k" using any statistical algorithm, which gives perfect accuracy in terms of low misclassification error rate.
Motivated by the prescribed problem, a new sample space reduction weighted voting mathematical rule (AVNM) is proposed for classification in machine learning. The proposed AVNM rule is also non-parametric in nature like KNN. AVNM uses the weighted voting mechanism with sample space reduction to learn and examine the predicted class label for unidentified sample. AVNM is free from any initial selection of predefined variable and neighbor selection as found in KNN algorithm. The proposed classifier also reduces the effect of outliers.
To verify the performance of the proposed AVNM classifier, experiments are made on 10 standard datasets taken from UCI database and one manually created dataset. The experimental result shows that the proposed AVNM rule outperforms the KNN classifier and its variants. Experimentation results based on confusion matrix accuracy parameter proves higher accuracy value with AVNM rule.
The proposed AVNM rule is based on sample space reduction mechanism for identification of an optimal number of nearest neighbor selections. AVNM results in better classification accuracy and minimum error rate as compared with the state-of-art algorithm, KNN, and its variants. The proposed rule automates the selection of nearest neighbor selection and improves classification rate for UCI dataset and manually created dataset.
在机器学习中,系统的准确性取决于分类结果。分类准确性在各个领域都起着至关重要的作用。非参数分类器如K近邻(KNN)是模式分析中使用最广泛的分类器。除了其简单、易行和有效的特点外,与KNN分类器相关的主要问题是选择最近邻的数量,即计算时的“k”值。目前,很难使用任何统计算法找到“k”的最优值,该最优值能在低误分类率方面给出完美的准确性。
受上述问题的启发,提出了一种新的样本空间约简加权投票数学规则(AVNM)用于机器学习中的分类。所提出的AVNM规则本质上也像KNN一样是非参数的。AVNM使用样本空间约简的加权投票机制来学习和检验未识别样本的预测类别标签。AVNM不像KNN算法那样需要预先选择任何预定义变量和邻居选择。所提出的分类器还减少了异常值的影响。
为了验证所提出的AVNM分类器的性能,在从UCI数据库获取的10个标准数据集和一个手动创建的数据集上进行了实验。实验结果表明,所提出的AVNM规则优于KNN分类器及其变体。基于混淆矩阵准确性参数的实验结果证明,AVNM规则具有更高的准确性值。
所提出的AVNM规则基于样本空间约简机制来确定最优的最近邻选择数量。与现有算法KNN及其变体相比,AVNM具有更好的分类准确性和最低的错误率。所提出的规则自动选择最近邻,并提高了UCI数据集和手动创建数据集的分类率。