基于两点的二叉搜索树加速 KNN 进行大数据分类。

Two-point-based binary search trees for accelerating big data classification using KNN.

机构信息

IT Department, Mu'tah University, Mutah-Karak, Jordan.

出版信息

PLoS One. 2018 Nov 26;13(11):e0207772. doi: 10.1371/journal.pone.0207772. eCollection 2018.

DOI:10.1371/journal.pone.0207772

PMID:30475862

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6257916/

Abstract

Big data classification is very slow when using traditional machine learning classifiers, particularly when using a lazy and slow-by-nature classifier such as the k-nearest neighbors algorithm (KNN). This paper proposes a new approach which is based on sorting the feature vectors of training data in a binary search tree to accelerate big data classification using the KNN approach. This is done using two methods, both of which utilize two local points to sort the examples based on their similarity to these local points. The first method chooses the local points based on their similarity to the global extreme points, while the second method chooses the local points randomly. The results of various experiments conducted on different big datasets show reasonable accuracy rates compared to state-of-the-art methods and the KNN classifier itself. More importantly, they show the high classification speed of both methods. This strong trait can be used to further improve the accuracy of the proposed methods.

摘要

当使用传统的机器学习分类器时，大数据分类非常缓慢，尤其是在使用像 k-最近邻算法（KNN）这样的懒惰和自然缓慢的分类器时。本文提出了一种新方法，该方法基于对训练数据的特征向量进行二叉搜索树排序，以使用 KNN 方法加速大数据分类。这是通过两种方法实现的，这两种方法都利用两个局部点来根据它们与这些局部点的相似性对示例进行排序。第一种方法基于与全局极值点的相似性选择局部点，而第二种方法随机选择局部点。在不同的大数据集上进行的各种实验的结果表明，与最先进的方法和 KNN 分类器本身相比，它们具有合理的准确率。更重要的是，它们显示了两种方法的高分类速度。这种强大的特性可以用于进一步提高所提出方法的准确性。

相似文献

Two-point-based binary search trees for accelerating big data classification using KNN.基于两点的二叉搜索树加速 KNN 进行大数据分类。

PLoS One. 2018 Nov 26;13(11):e0207772. doi: 10.1371/journal.pone.0207772. eCollection 2018.

EKNN: Ensemble classifier incorporating connectivity and density into kNN with application to cancer diagnosis.EKNN：将连通性和密度纳入k近邻算法的集成分类器及其在癌症诊断中的应用

Artif Intell Med. 2021 Jan;111:101985. doi: 10.1016/j.artmed.2020.101985. Epub 2020 Nov 8.

Effects of Distance Measure Choice on K-Nearest Neighbor Classifier Performance: A Review.距离度量选择对 K-最近邻分类器性能的影响：综述

Big Data. 2019 Dec;7(4):221-248. doi: 10.1089/big.2018.0175. Epub 2019 Aug 14.

AVNM: A Voting based Novel Mathematical Rule for Image Classification.AVNM：一种基于投票的图像分类新数学规则。

Comput Methods Programs Biomed. 2016 Dec;137:195-201. doi: 10.1016/j.cmpb.2016.08.015. Epub 2016 Sep 26.

Improving the Accuracy of Ensemble Machine Learning Classification Models Using a Novel Bit-Fusion Algorithm for Healthcare AI Systems.利用一种新颖的位融合算法提高医疗 AI 系统中集成机器学习分类模型的准确性。

Front Public Health. 2022 May 4;10:858282. doi: 10.3389/fpubh.2022.858282. eCollection 2022.

Study on the semi-supervised learning-based patient similarity from heterogeneous electronic medical records.基于半监督学习的异质电子病历中患者相似性研究。

BMC Med Inform Decis Mak. 2021 Jul 30;21(Suppl 2):58. doi: 10.1186/s12911-021-01432-x.

Golden eagle based improved Att-BiLSTM model for big data classification with hybrid feature extraction and feature selection techniques.基于金鹰优化的Att-BiLSTM模型，用于结合混合特征提取和特征选择技术的大数据分类

Network. 2024 May;35(2):154-189. doi: 10.1080/0954898X.2023.2293895. Epub 2023 Dec 28.

A modified weighted mean of vectors optimizer for Chronic Kidney disease classification.一种用于慢性肾脏病分类的改进型向量优化器加权均值法

Comput Biol Med. 2023 Mar;155:106691. doi: 10.1016/j.compbiomed.2023.106691. Epub 2023 Feb 16.

Large scale biomedical texts classification: a kNN and an ESA-based approaches.大规模生物医学文本分类：基于k近邻算法和基于词嵌入语义分析的方法。

J Biomed Semantics. 2016 Jun 16;7:40. doi: 10.1186/s13326-016-0073-1.

Efficient kNN Classification With Different Numbers of Nearest Neighbors.高效 kNN 分类与不同数量的近邻。

IEEE Trans Neural Netw Learn Syst. 2018 May;29(5):1774-1785. doi: 10.1109/TNNLS.2017.2673241. Epub 2017 Apr 12.

引用本文的文献

Development and evaluation of a novel framework to enhance k-NN algorithm's accuracy in data sparsity contexts.一种用于提高k近邻算法在数据稀疏环境下准确性的新型框架的开发与评估。

Sci Rep. 2024 Oct 23;14(1):25036. doi: 10.1038/s41598-024-76909-6.

Privacy-preserving parallel kNN classification algorithm using index-based filtering in cloud computing.基于索引过滤的云计算中隐私保护的并行 kNN 分类算法。

PLoS One. 2022 May 5;17(5):e0267908. doi: 10.1371/journal.pone.0267908. eCollection 2022.

Deep Learning in Medical Imaging.医学成像中的深度学习

Neurospine. 2019 Dec;16(4):657-668. doi: 10.14245/ns.1938396.198. Epub 2019 Dec 31.

本文引用的文献

Efficient kNN Classification With Different Numbers of Nearest Neighbors.高效 kNN 分类与不同数量的近邻。

IEEE Trans Neural Netw Learn Syst. 2018 May;29(5):1774-1785. doi: 10.1109/TNNLS.2017.2673241. Epub 2017 Apr 12.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验