• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于高维数据的随机k条件最近邻

Random k conditional nearest neighbor for high-dimensional data.

作者信息

Lu Jiaxuan, Gweon Hyukjun

机构信息

University of Western Ontario, London, ON, Canada.

出版信息

PeerJ Comput Sci. 2025 Jan 24;11:e2497. doi: 10.7717/peerj-cs.2497. eCollection 2025.

DOI:10.7717/peerj-cs.2497
PMID:39896033
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11784752/
Abstract

The k nearest neighbor (kNN) approach is a simple and effective algorithm for classification and a number of variants have been proposed based on the kNN algorithm. One of the limitations of kNN is that the method may be less effective when data contains many noisy features due to their non-informative influence in calculating distance. Additionally, information derived from nearest neighbors may be less meaningful in high-dimensional data. To address the limitation of nearest-neighbor based approaches in high-dimensional data, we propose to extend the k conditional nearest neighbor (kCNN) method which is an effective variant of kNN. The proposed approach aggregates multiple kCNN classifiers, each constructed from a randomly sampled feature subset. We also develop a score metric to weigh individual classifiers based on the level of separation of the feature subsets. We investigate the properties of the proposed method using simulation. Moreover, the experiments on gene expression datasets show that the proposed method is promising in terms of predictive classification performance.

摘要

k近邻(kNN)方法是一种简单有效的分类算法,基于kNN算法已经提出了许多变体。kNN的局限性之一在于,当数据包含许多噪声特征时,由于这些特征在计算距离时的非信息性影响,该方法可能效果较差。此外,在高维数据中,从最近邻获得的信息可能意义不大。为了解决基于最近邻方法在高维数据中的局限性,我们建议扩展k条件近邻(kCNN)方法,它是kNN的一种有效变体。所提出的方法聚合了多个kCNN分类器,每个分类器由随机采样的特征子集构建。我们还开发了一种评分指标,根据特征子集的分离程度对各个分类器进行加权。我们通过模拟研究了所提出方法的性质。此外,在基因表达数据集上的实验表明,所提出的方法在预测分类性能方面很有前景。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9656/11784752/7377a9ac6e28/peerj-cs-11-2497-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9656/11784752/d6862ba81cda/peerj-cs-11-2497-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9656/11784752/a3f7dd5c5dcd/peerj-cs-11-2497-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9656/11784752/a4828d527e26/peerj-cs-11-2497-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9656/11784752/e57f10f23071/peerj-cs-11-2497-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9656/11784752/b0911eebe449/peerj-cs-11-2497-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9656/11784752/7377a9ac6e28/peerj-cs-11-2497-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9656/11784752/d6862ba81cda/peerj-cs-11-2497-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9656/11784752/a3f7dd5c5dcd/peerj-cs-11-2497-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9656/11784752/a4828d527e26/peerj-cs-11-2497-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9656/11784752/e57f10f23071/peerj-cs-11-2497-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9656/11784752/b0911eebe449/peerj-cs-11-2497-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9656/11784752/7377a9ac6e28/peerj-cs-11-2497-g006.jpg

相似文献

1
Random k conditional nearest neighbor for high-dimensional data.用于高维数据的随机k条件最近邻
PeerJ Comput Sci. 2025 Jan 24;11:e2497. doi: 10.7717/peerj-cs.2497. eCollection 2025.
2
The k conditional nearest neighbor algorithm for classification and class probability estimation.用于分类和类概率估计的k条件最近邻算法。
PeerJ Comput Sci. 2019 May 13;5:e194. doi: 10.7717/peerj-cs.194. eCollection 2019.
3
Hybrid k -Nearest Neighbor Classifier.混合 k-最近邻分类器。
IEEE Trans Cybern. 2016 Jun;46(6):1263-75. doi: 10.1109/TCYB.2015.2443857. Epub 2015 Jun 26.
4
Random kernel k-nearest neighbors regression.随机核k近邻回归
Front Big Data. 2024 Jul 1;7:1402384. doi: 10.3389/fdata.2024.1402384. eCollection 2024.
5
AVNM: A Voting based Novel Mathematical Rule for Image Classification.AVNM:一种基于投票的图像分类新数学规则。
Comput Methods Programs Biomed. 2016 Dec;137:195-201. doi: 10.1016/j.cmpb.2016.08.015. Epub 2016 Sep 26.
6
An Enhanced Quantum K-Nearest Neighbor Classification Algorithm Based on Polar Distance.一种基于极距的增强型量子K近邻分类算法
Entropy (Basel). 2023 Jan 8;25(1):127. doi: 10.3390/e25010127.
7
An Improved k-Nearest Neighbor Algorithm for Recognition and Classification of Thyroid Nodules.基于改进的 k-最近邻算法的甲状腺结节识别与分类。
J Ultrasound Med. 2024 Jun;43(6):1025-1036. doi: 10.1002/jum.16429. Epub 2024 Feb 23.
8
Random KNN feature selection - a fast and stable alternative to Random Forests.随机近邻特征选择 - 一种比随机森林更快更稳定的替代方法。
BMC Bioinformatics. 2011 Nov 18;12:450. doi: 10.1186/1471-2105-12-450.
9
A novel template reduction approach for the K-nearest neighbor method.一种用于K近邻方法的新型模板约简方法。
IEEE Trans Neural Netw. 2009 May;20(5):890-6. doi: 10.1109/TNN.2009.2018547. Epub 2009 Apr 21.
10
Study on the semi-supervised learning-based patient similarity from heterogeneous electronic medical records.基于半监督学习的异质电子病历中患者相似性研究。
BMC Med Inform Decis Mak. 2021 Jul 30;21(Suppl 2):58. doi: 10.1186/s12911-021-01432-x.

本文引用的文献

1
The k conditional nearest neighbor algorithm for classification and class probability estimation.用于分类和类概率估计的k条件最近邻算法。
PeerJ Comput Sci. 2019 May 13;5:e194. doi: 10.7717/peerj-cs.194. eCollection 2019.
2
Machine Learning: Algorithms, Real-World Applications and Research Directions.机器学习:算法、实际应用与研究方向。
SN Comput Sci. 2021;2(3):160. doi: 10.1007/s42979-021-00592-x. Epub 2021 Mar 22.
3
Ensemble of a subset of NN classifiers.神经网络分类器子集的集成。
Adv Data Anal Classif. 2018;12(4):827-840. doi: 10.1007/s11634-015-0227-5. Epub 2016 Jan 22.
4
Statistical challenges of high-dimensional data.高维数据的统计挑战。
Philos Trans A Math Phys Eng Sci. 2009 Nov 13;367(1906):4237-53. doi: 10.1098/rsta.2009.0159.
5
The properties of high-dimensional data spaces: implications for exploring gene and protein expression data.高维数据空间的特性:对探索基因和蛋白质表达数据的启示
Nat Rev Cancer. 2008 Jan;8(1):37-49. doi: 10.1038/nrc2294.
6
Ensemble methods for classification of patients for personalized medicine with high-dimensional data.用于基于高维数据的个性化医疗中患者分类的集成方法。
Artif Intell Med. 2007 Nov;41(3):197-207. doi: 10.1016/j.artmed.2007.07.003. Epub 2007 Aug 23.
7
Visualization-based cancer microarray data classification analysis.基于可视化的癌症微阵列数据分类分析
Bioinformatics. 2007 Aug 15;23(16):2147-54. doi: 10.1093/bioinformatics/btm312. Epub 2007 Jun 22.
8
Application of K-nearest neighbors algorithm on breast cancer diagnosis problem.K近邻算法在乳腺癌诊断问题上的应用。
Proc AMIA Symp. 2000:759-63.
9
Comparison of the predicted and observed secondary structure of T4 phage lysozyme.T4噬菌体溶菌酶预测二级结构与观察到的二级结构的比较。
Biochim Biophys Acta. 1975 Oct 20;405(2):442-51. doi: 10.1016/0005-2795(75)90109-9.