Suppr超能文献

用于高维数据的随机k条件最近邻

Random k conditional nearest neighbor for high-dimensional data.

作者信息

Lu Jiaxuan, Gweon Hyukjun

机构信息

University of Western Ontario, London, ON, Canada.

出版信息

PeerJ Comput Sci. 2025 Jan 24;11:e2497. doi: 10.7717/peerj-cs.2497. eCollection 2025.

Abstract

The k nearest neighbor (kNN) approach is a simple and effective algorithm for classification and a number of variants have been proposed based on the kNN algorithm. One of the limitations of kNN is that the method may be less effective when data contains many noisy features due to their non-informative influence in calculating distance. Additionally, information derived from nearest neighbors may be less meaningful in high-dimensional data. To address the limitation of nearest-neighbor based approaches in high-dimensional data, we propose to extend the k conditional nearest neighbor (kCNN) method which is an effective variant of kNN. The proposed approach aggregates multiple kCNN classifiers, each constructed from a randomly sampled feature subset. We also develop a score metric to weigh individual classifiers based on the level of separation of the feature subsets. We investigate the properties of the proposed method using simulation. Moreover, the experiments on gene expression datasets show that the proposed method is promising in terms of predictive classification performance.

摘要

k近邻(kNN)方法是一种简单有效的分类算法,基于kNN算法已经提出了许多变体。kNN的局限性之一在于,当数据包含许多噪声特征时,由于这些特征在计算距离时的非信息性影响,该方法可能效果较差。此外,在高维数据中,从最近邻获得的信息可能意义不大。为了解决基于最近邻方法在高维数据中的局限性,我们建议扩展k条件近邻(kCNN)方法,它是kNN的一种有效变体。所提出的方法聚合了多个kCNN分类器,每个分类器由随机采样的特征子集构建。我们还开发了一种评分指标,根据特征子集的分离程度对各个分类器进行加权。我们通过模拟研究了所提出方法的性质。此外,在基因表达数据集上的实验表明,所提出的方法在预测分类性能方面很有前景。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9656/11784752/d6862ba81cda/peerj-cs-11-2497-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验