Suppr超能文献

一种用于提高k近邻算法在数据稀疏环境下准确性的新型框架的开发与评估。

Development and evaluation of a novel framework to enhance k-NN algorithm's accuracy in data sparsity contexts.

作者信息

Giannopoulos Panagiotis G, Dasaklis Thomas K, Rachaniotis Nikolaos

机构信息

School of Social Sciences, Hellenic Open University, Patras, 26335, Greece.

Department of Industrial Management and Technology, University of Piraeus, Piraeus, 18534, Greece.

出版信息

Sci Rep. 2024 Oct 23;14(1):25036. doi: 10.1038/s41598-024-76909-6.

Abstract

This paper presents a novel framework for implementing the k-NN algorithm, designed to enhance its accuracy in contexts with sparse data. The framework addresses limitations in the algorithm's training process by optimizing data structures. It employs composite datasets generated from the initial data using a data-driven fuzzy Analytic Hierarchy Process weighting scheme. This approach is designed to enhance the informational content in the initial datasets, thus reducing the entropy and implementation uncertainty. The framework was evaluated using 75 publicly available datasets and 3 generated datasets, demonstrating significant accuracy improvements across various k-parameter values. The findings were rigorously generalized using non-parametric hypothesis tests; while the resulting sensitivity was assessed by applying different distance metrics. By enhancing informational content, the composite data structures contribute to both accuracy improvements and scalability, particularly in data-sparse contexts. This relationship underscores the critical role of entropy in enhancing the performance of explainable machine learning algorithms, providing a valuable and interpretable tool for transforming data structures in sparse data environments.

摘要

本文提出了一种用于实现k近邻算法的新颖框架,旨在提高其在稀疏数据环境中的准确性。该框架通过优化数据结构来解决算法训练过程中的局限性。它采用了一种数据驱动的模糊层次分析法加权方案,从初始数据生成复合数据集。这种方法旨在增强初始数据集中的信息内容,从而降低熵和实现不确定性。该框架使用75个公开可用数据集和3个生成的数据集进行了评估,结果表明在各种k参数值下准确性都有显著提高。研究结果通过非参数假设检验进行了严格的归纳;同时通过应用不同的距离度量来评估结果的敏感性。通过增强信息内容,复合数据结构有助于提高准确性和可扩展性,特别是在数据稀疏的环境中。这种关系强调了熵在增强可解释机器学习算法性能方面的关键作用,为在稀疏数据环境中转换数据结构提供了一个有价值且可解释的工具。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7fbf/11499816/4e85c3ca1625/41598_2024_76909_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验