一种用于提高k近邻算法在数据稀疏环境下准确性的新型框架的开发与评估。

Development and evaluation of a novel framework to enhance k-NN algorithm's accuracy in data sparsity contexts.

作者信息

Giannopoulos Panagiotis G, Dasaklis Thomas K, Rachaniotis Nikolaos

机构信息

School of Social Sciences, Hellenic Open University, Patras, 26335, Greece.

Department of Industrial Management and Technology, University of Piraeus, Piraeus, 18534, Greece.

出版信息

Sci Rep. 2024 Oct 23;14(1):25036. doi: 10.1038/s41598-024-76909-6.

DOI:10.1038/s41598-024-76909-6

PMID:39443669

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11499816/

Abstract

This paper presents a novel framework for implementing the k-NN algorithm, designed to enhance its accuracy in contexts with sparse data. The framework addresses limitations in the algorithm's training process by optimizing data structures. It employs composite datasets generated from the initial data using a data-driven fuzzy Analytic Hierarchy Process weighting scheme. This approach is designed to enhance the informational content in the initial datasets, thus reducing the entropy and implementation uncertainty. The framework was evaluated using 75 publicly available datasets and 3 generated datasets, demonstrating significant accuracy improvements across various k-parameter values. The findings were rigorously generalized using non-parametric hypothesis tests; while the resulting sensitivity was assessed by applying different distance metrics. By enhancing informational content, the composite data structures contribute to both accuracy improvements and scalability, particularly in data-sparse contexts. This relationship underscores the critical role of entropy in enhancing the performance of explainable machine learning algorithms, providing a valuable and interpretable tool for transforming data structures in sparse data environments.

摘要

本文提出了一种用于实现k近邻算法的新颖框架，旨在提高其在稀疏数据环境中的准确性。该框架通过优化数据结构来解决算法训练过程中的局限性。它采用了一种数据驱动的模糊层次分析法加权方案，从初始数据生成复合数据集。这种方法旨在增强初始数据集中的信息内容，从而降低熵和实现不确定性。该框架使用75个公开可用数据集和3个生成的数据集进行了评估，结果表明在各种k参数值下准确性都有显著提高。研究结果通过非参数假设检验进行了严格的归纳；同时通过应用不同的距离度量来评估结果的敏感性。通过增强信息内容，复合数据结构有助于提高准确性和可扩展性，特别是在数据稀疏的环境中。这种关系强调了熵在增强可解释机器学习算法性能方面的关键作用，为在稀疏数据环境中转换数据结构提供了一个有价值且可解释的工具。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7fbf/11499816/4e85c3ca1625/41598_2024_76909_Fig1_HTML.jpg

相似文献

Development and evaluation of a novel framework to enhance k-NN algorithm's accuracy in data sparsity contexts.一种用于提高k近邻算法在数据稀疏环境下准确性的新型框架的开发与评估。

Sci Rep. 2024 Oct 23;14(1):25036. doi: 10.1038/s41598-024-76909-6.

Publisher Correction: Development and evaluation of a novel framework to enhance k-NN algorithm's accuracy in data sparsity contexts.出版商更正：一种用于提高k近邻算法在数据稀疏环境下准确性的新型框架的开发与评估。

Sci Rep. 2024 Nov 15;14(1):28210. doi: 10.1038/s41598-024-79198-1.

Hybrid evolutionary machine learning model for advanced intrusion detection architecture for cyber threat identification.用于网络威胁识别的高级入侵检测架构的混合进化机器学习模型。

PLoS One. 2024 Sep 12;19(9):e0308206. doi: 10.1371/journal.pone.0308206. eCollection 2024.

[Fully Automatic Glioma Segmentation Algorithm of Magnetic Resonance Imaging Based on 3D-UNet With More Global Contextual Feature Extraction: An Improvement on Insufficient Extraction of Global Features].基于具有更多全局上下文特征提取的3D-UNet的磁共振成像全自动胶质瘤分割算法：对全局特征提取不足的改进

Sichuan Da Xue Xue Bao Yi Xue Ban. 2024 Mar 20;55(2):447-454. doi: 10.12182/20240360208.

A Robust and High-Dimensional Clustering Algorithm Based on Feature Weight and Entropy.一种基于特征权重和熵的稳健高维聚类算法。

Entropy (Basel). 2023 Mar 16;25(3):510. doi: 10.3390/e25030510.

Enhancing classification accuracy of HRF signals in fNIRS using semi-supervised learning and filtering.利用半监督学习和滤波提高近红外 fNIRS 信号的分类准确性。

Prog Brain Res. 2024;290:83-104. doi: 10.1016/bs.pbr.2024.05.009. Epub 2024 May 31.

Pneumothorax detection in chest radiographs: optimizing artificial intelligence system for accuracy and confounding bias reduction using in-image annotations in algorithm training.胸片中的气胸检测：通过在算法训练中使用图像内标注来优化人工智能系统的准确性和减少混杂偏差。

Eur Radiol. 2021 Oct;31(10):7888-7900. doi: 10.1007/s00330-021-07833-w. Epub 2021 Mar 27.

Estimating potential evapotranspiration based on self-optimizing nearest neighbor algorithms: a case study in arid-semiarid environments, Northwest of China.基于自优化最近邻算法估算潜在蒸散量：以中国西北地区干旱半干旱环境为例。

Environ Sci Pollut Res Int. 2020 Oct;27(30):37176-37187. doi: 10.1007/s11356-019-06597-7. Epub 2019 Oct 25.

Smoking Classification Using Novel Plasma Cytokines by implementing Machine Learning and Statistical Methods.通过机器学习和统计方法利用新型血浆细胞因子进行吸烟分类

Proc (Int Conf Comput Sci Comput Intell). 2023 Dec;2023:686-694. doi: 10.1109/csci62032.2023.00118. Epub 2024 Jul 19.

Machine learning algorithms for predicting COVID-19 mortality in Ethiopia.用于预测埃塞俄比亚 COVID-19 死亡率的机器学习算法。

BMC Public Health. 2024 Jun 28;24(1):1728. doi: 10.1186/s12889-024-19196-0.

本文引用的文献

Machine learning for modeling NO emissions from wastewater treatment plants: Aligning model performance, complexity, and interpretability.用于模拟污水处理厂氮氧化物排放的机器学习：协调模型性能、复杂性和可解释性。

Water Res. 2023 Oct 15;245:120667. doi: 10.1016/j.watres.2023.120667. Epub 2023 Sep 24.

On Splitting Training and Validation Set: A Comparative Study of Cross-Validation, Bootstrap and Systematic Sampling for Estimating the Generalization Performance of Supervised Learning.关于划分训练集和验证集：交叉验证、自助法和系统抽样在估计监督学习泛化性能方面的比较研究

J Anal Test. 2018;2(3):249-262. doi: 10.1007/s41664-018-0068-2. Epub 2018 Oct 29.

Two-point-based binary search trees for accelerating big data classification using KNN.基于两点的二叉搜索树加速 KNN 进行大数据分类。

PLoS One. 2018 Nov 26;13(11):e0207772. doi: 10.1371/journal.pone.0207772. eCollection 2018.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

一种用于提高k近邻算法在数据稀疏环境下准确性的新型框架的开发与评估。

Development and evaluation of a novel framework to enhance k-NN algorithm's accuracy in data sparsity contexts.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

本文引用的文献