IT Department, Mu'tah University, Mutah-Karak, Jordan.
PLoS One. 2018 Nov 26;13(11):e0207772. doi: 10.1371/journal.pone.0207772. eCollection 2018.
Big data classification is very slow when using traditional machine learning classifiers, particularly when using a lazy and slow-by-nature classifier such as the k-nearest neighbors algorithm (KNN). This paper proposes a new approach which is based on sorting the feature vectors of training data in a binary search tree to accelerate big data classification using the KNN approach. This is done using two methods, both of which utilize two local points to sort the examples based on their similarity to these local points. The first method chooses the local points based on their similarity to the global extreme points, while the second method chooses the local points randomly. The results of various experiments conducted on different big datasets show reasonable accuracy rates compared to state-of-the-art methods and the KNN classifier itself. More importantly, they show the high classification speed of both methods. This strong trait can be used to further improve the accuracy of the proposed methods.
当使用传统的机器学习分类器时,大数据分类非常缓慢,尤其是在使用像 k-最近邻算法(KNN)这样的懒惰和自然缓慢的分类器时。本文提出了一种新方法,该方法基于对训练数据的特征向量进行二叉搜索树排序,以使用 KNN 方法加速大数据分类。这是通过两种方法实现的,这两种方法都利用两个局部点来根据它们与这些局部点的相似性对示例进行排序。第一种方法基于与全局极值点的相似性选择局部点,而第二种方法随机选择局部点。在不同的大数据集上进行的各种实验的结果表明,与最先进的方法和 KNN 分类器本身相比,它们具有合理的准确率。更重要的是,它们显示了两种方法的高分类速度。这种强大的特性可以用于进一步提高所提出方法的准确性。