Giannopoulos Panagiotis G, Dasaklis Thomas K, Rachaniotis Nikolaos
School of Social Sciences, Hellenic Open University, Patras, 26335, Greece.
Department of Industrial Management and Technology, University of Piraeus, Piraeus, 18534, Greece.
Sci Rep. 2024 Oct 23;14(1):25036. doi: 10.1038/s41598-024-76909-6.
This paper presents a novel framework for implementing the k-NN algorithm, designed to enhance its accuracy in contexts with sparse data. The framework addresses limitations in the algorithm's training process by optimizing data structures. It employs composite datasets generated from the initial data using a data-driven fuzzy Analytic Hierarchy Process weighting scheme. This approach is designed to enhance the informational content in the initial datasets, thus reducing the entropy and implementation uncertainty. The framework was evaluated using 75 publicly available datasets and 3 generated datasets, demonstrating significant accuracy improvements across various k-parameter values. The findings were rigorously generalized using non-parametric hypothesis tests; while the resulting sensitivity was assessed by applying different distance metrics. By enhancing informational content, the composite data structures contribute to both accuracy improvements and scalability, particularly in data-sparse contexts. This relationship underscores the critical role of entropy in enhancing the performance of explainable machine learning algorithms, providing a valuable and interpretable tool for transforming data structures in sparse data environments.
本文提出了一种用于实现k近邻算法的新颖框架,旨在提高其在稀疏数据环境中的准确性。该框架通过优化数据结构来解决算法训练过程中的局限性。它采用了一种数据驱动的模糊层次分析法加权方案,从初始数据生成复合数据集。这种方法旨在增强初始数据集中的信息内容,从而降低熵和实现不确定性。该框架使用75个公开可用数据集和3个生成的数据集进行了评估,结果表明在各种k参数值下准确性都有显著提高。研究结果通过非参数假设检验进行了严格的归纳;同时通过应用不同的距离度量来评估结果的敏感性。通过增强信息内容,复合数据结构有助于提高准确性和可扩展性,特别是在数据稀疏的环境中。这种关系强调了熵在增强可解释机器学习算法性能方面的关键作用,为在稀疏数据环境中转换数据结构提供了一个有价值且可解释的工具。