• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

距离度量选择对 K-最近邻分类器性能的影响:综述

Effects of Distance Measure Choice on K-Nearest Neighbor Classifier Performance: A Review.

机构信息

Department of Computer Science, Faculty of Information Technology, Mutah University, Karak, Jordan.

Department of Algorithm and Their Applications, Eötvös Loránd University, Budapest, Hungary.

出版信息

Big Data. 2019 Dec;7(4):221-248. doi: 10.1089/big.2018.0175. Epub 2019 Aug 14.

DOI:10.1089/big.2018.0175
PMID:31411491
Abstract

The K-nearest neighbor (KNN) classifier is one of the simplest and most common classifiers, yet its performance competes with the most complex classifiers in the literature. The core of this classifier depends mainly on measuring the distance or similarity between the tested examples and the training examples. This raises a major question about which distance measures to be used for the KNN classifier among a large number of distance and similarity measures available? This review attempts to answer this question through evaluating the performance (measured by accuracy, precision, and recall) of the KNN using a large number of distance measures, tested on a number of real-world data sets, with and without adding different levels of noise. The experimental results show that the performance of KNN classifier depends significantly on the distance used, and the results showed large gaps between the performances of different distances. We found that a recently proposed nonconvex distance performed the best when applied on most data sets comparing with the other tested distances. In addition, the performance of the KNN with this top performing distance degraded only ∼20% while the noise level reaches 90%, this is true for most of the distances used as well. This means that the KNN classifier using any of the top 10 distances tolerates noise to a certain degree. Moreover, the results show that some distances are less affected by the added noise comparing with other distances.

摘要

K 近邻(KNN)分类器是最简单和最常见的分类器之一,但它的性能可与文献中最复杂的分类器相媲美。这个分类器的核心主要取决于测量测试样本和训练样本之间的距离或相似度。这就提出了一个主要问题,即在大量可用的距离和相似度度量中,应该使用哪些距离度量来进行 KNN 分类器?本综述通过评估大量距离度量在大量真实数据集上的性能(通过准确性、精度和召回率来衡量),试图回答这个问题,并且在有无添加不同程度的噪声的情况下进行了测试。实验结果表明,KNN 分类器的性能显著依赖于所使用的距离,并且不同距离的性能之间存在很大差距。我们发现,在大多数数据集上,最近提出的一种非凸距离的性能优于其他测试距离。此外,当噪声水平达到 90%时,使用性能最佳的距离的 KNN 分类器的性能仅下降约 20%,大多数使用的距离也是如此。这意味着,使用前 10 个距离中的任何一个的 KNN 分类器在一定程度上可以容忍噪声。此外,结果表明,与其他距离相比,某些距离受添加噪声的影响较小。

相似文献

1
Effects of Distance Measure Choice on K-Nearest Neighbor Classifier Performance: A Review.距离度量选择对 K-最近邻分类器性能的影响:综述
Big Data. 2019 Dec;7(4):221-248. doi: 10.1089/big.2018.0175. Epub 2019 Aug 14.
2
Robust Distance Measures for NN Classification of Cancer Data.用于癌症数据神经网络分类的稳健距离度量
Cancer Inform. 2020 Oct 13;19:1176935120965542. doi: 10.1177/1176935120965542. eCollection 2020.
3
Study on the semi-supervised learning-based patient similarity from heterogeneous electronic medical records.基于半监督学习的异质电子病历中患者相似性研究。
BMC Med Inform Decis Mak. 2021 Jul 30;21(Suppl 2):58. doi: 10.1186/s12911-021-01432-x.
4
Ensemble Clustering Classification compete SVM and One-Class classifiers applied on plant microRNAs Data.集成聚类分类与支持向量机(SVM)和单类分类器在植物微小RNA数据上的应用进行比较。
J Integr Bioinform. 2016 Dec 22;13(5):304. doi: 10.2390/biecoll-jib-2016-304.
5
AVNM: A Voting based Novel Mathematical Rule for Image Classification.AVNM:一种基于投票的图像分类新数学规则。
Comput Methods Programs Biomed. 2016 Dec;137:195-201. doi: 10.1016/j.cmpb.2016.08.015. Epub 2016 Sep 26.
6
EKNN: Ensemble classifier incorporating connectivity and density into kNN with application to cancer diagnosis.EKNN:将连通性和密度纳入k近邻算法的集成分类器及其在癌症诊断中的应用
Artif Intell Med. 2021 Jan;111:101985. doi: 10.1016/j.artmed.2020.101985. Epub 2020 Nov 8.
7
Fissures segmentation using surface features: content-based retrieval for mammographic mass using ensemble classifier.利用表面特征进行裂隙分割:基于内容的乳腺肿块检索使用集成分类器。
Acad Radiol. 2011 Dec;18(12):1475-84. doi: 10.1016/j.acra.2011.08.012.
8
A Training Data Set Cleaning Method by Classification Ability Ranking for the k -Nearest Neighbor Classifier.一种基于k近邻分类器分类能力排序的训练数据集清理方法。
IEEE Trans Neural Netw Learn Syst. 2020 May;31(5):1544-1556. doi: 10.1109/TNNLS.2019.2920864. Epub 2019 Jun 28.
9
Comparative performance analysis of K-nearest neighbour (KNN) algorithm and its different variants for disease prediction.用于疾病预测的K近邻(KNN)算法及其不同变体的性能比较分析。
Sci Rep. 2022 Apr 15;12(1):6256. doi: 10.1038/s41598-022-10358-x.
10
Assessing Children's Fine Motor Skills With Sensor-Augmented Toys: Machine Learning Approach.使用传感器增强玩具评估儿童精细运动技能:机器学习方法。
J Med Internet Res. 2021 Apr 22;23(4):e24237. doi: 10.2196/24237.

引用本文的文献

1
Collaborative filtering models an experimental and detailed comparative study.协同过滤模型:一项实验性和详细的比较研究。
Sci Rep. 2025 Aug 28;15(1):31667. doi: 10.1038/s41598-025-15096-4.
2
Evaluation of inflammatory bowel disease-related sleep disorders based on an interpretable machine learning approach: a multicenter study in China.基于可解释机器学习方法的炎症性肠病相关睡眠障碍评估:一项中国多中心研究
Therap Adv Gastroenterol. 2025 Aug 15;18:17562848251359141. doi: 10.1177/17562848251359141. eCollection 2025.
3
Machine learning-based predictive modeling of angina pectoris in an elderly community-dwelling population: Results from the PoCOsteo study.
基于机器学习的老年社区居住人群心绞痛预测模型:PoCOsteo研究结果
PLoS One. 2025 Aug 5;20(8):e0329023. doi: 10.1371/journal.pone.0329023. eCollection 2025.
4
A web-based prediction model for brain metastasis in non-small cell lung cancer patients.一种用于非小细胞肺癌患者脑转移的基于网络的预测模型。
Discov Oncol. 2025 Jul 29;16(1):1438. doi: 10.1007/s12672-025-03298-1.
5
A hybrid approach to enhance HbA1c prediction accuracy while minimizing the number of associated predictors: A case-control study in Saudi Arabia.一种在最小化相关预测因子数量的同时提高糖化血红蛋白(HbA1c)预测准确性的混合方法:沙特阿拉伯的一项病例对照研究。
PLoS One. 2025 Jun 17;20(6):e0326315. doi: 10.1371/journal.pone.0326315. eCollection 2025.
6
Exploring supportive care needs of lung cancer patients in China and predicting with machine learning models.探索中国肺癌患者的支持性护理需求并使用机器学习模型进行预测。
Support Care Cancer. 2025 Jun 13;33(7):573. doi: 10.1007/s00520-025-09619-y.
7
Machine learning is changing osteoporosis detection: an integrative review.机器学习正在改变骨质疏松症的检测:一项综合综述。
Osteoporos Int. 2025 Jun 10. doi: 10.1007/s00198-025-07541-x.
8
Unlocking the potential of wearable technology: Fitbit-derived measures for predicting ADHD in adolescents.释放可穿戴技术的潜力:基于Fitbit的测量方法预测青少年多动症
Front Child Adolesc Psychiatry. 2025 May 22;4:1504323. doi: 10.3389/frcha.2025.1504323. eCollection 2025.
9
Enlightened prognosis: Hepatitis prediction with an explainable machine learning approach.明智的预后:使用可解释的机器学习方法进行肝炎预测。
PLoS One. 2025 Apr 2;20(4):e0319078. doi: 10.1371/journal.pone.0319078. eCollection 2025.
10
A Deep Convolution Method for Hypertension Detection from Ballistocardiogram Signals with Heat-Map-Guided Data Augmentation.一种基于心冲击图信号的深度卷积方法用于高血压检测,并采用热图引导的数据增强技术
Bioengineering (Basel). 2025 Feb 21;12(3):221. doi: 10.3390/bioengineering12030221.