压缩神经网络：带数据压缩的K近邻算法

Compressed NN: K-Nearest Neighbors with Data Compression.

作者信息

Salvador-Meneses Jaime, Ruiz-Chavez Zoila, Garcia-Rodriguez Jose

机构信息

Facultad de Ingeniería, Ciencias Físicas y Matemática, Universidad Central del Ecuador, Quito 170129, Ecuador.

Computer Technology Department, University of Alicante, 03080 Alicante, Spain.

出版信息

Entropy (Basel). 2019 Feb 28;21(3):234. doi: 10.3390/e21030234.

DOI:10.3390/e21030234

PMID:33266949

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7514715/

Abstract

The NN (k-nearest neighbors) classification algorithm is one of the most widely used non-parametric classification methods, however it is limited due to memory consumption related to the size of the dataset, which makes them impractical to apply to large volumes of data. Variations of this method have been proposed, such as condensed KNN which divides the training dataset into clusters to be classified, other variations reduce the input dataset in order to apply the algorithm. This paper presents a variation of the NN algorithm, of the type structure less NN, to work with categorical data. Categorical data, due to their nature, can be compressed in order to decrease the memory requirements at the time of executing the classification. The method proposes a previous phase of compression of the data to then apply the algorithm on the compressed data. This allows us to maintain the whole dataset in memory which leads to a considerable reduction of the amount of memory required. Experiments and tests carried out on known datasets show the reduction in the volume of information stored in memory and maintain the accuracy of the classification. They also show a slight decrease in processing time because the information is decompressed in real time (on-the-fly) while the algorithm is running.

摘要

NN（k近邻）分类算法是应用最为广泛的非参数分类方法之一，然而，由于与数据集大小相关的内存消耗，该算法存在局限性，这使得它在处理大量数据时并不实用。人们已经提出了该方法的多种变体，比如凝聚KNN，它将训练数据集划分为多个待分类的簇，其他变体则通过减少输入数据集来应用该算法。本文提出了一种NN算法的变体，即无结构NN，用于处理分类数据。分类数据因其性质，可以进行压缩，以降低执行分类时的内存需求。该方法建议在数据上进行一个预压缩阶段，然后在压缩后的数据上应用算法。这使我们能够将整个数据集保存在内存中，从而大幅减少所需的内存量。在已知数据集上进行的实验和测试表明，内存中存储的信息量有所减少，同时分类的准确性得以保持。实验还表明，处理时间略有减少，因为在算法运行时信息是实时（动态）解压缩的。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3064/7514715/da9e43772484/entropy-21-00234-g001.jpg

相似文献

Compressed NN: K-Nearest Neighbors with Data Compression.压缩神经网络：带数据压缩的K近邻算法

Entropy (Basel). 2019 Feb 28;21(3):234. doi: 10.3390/e21030234.

EKNN: Ensemble classifier incorporating connectivity and density into kNN with application to cancer diagnosis.EKNN：将连通性和密度纳入k近邻算法的集成分类器及其在癌症诊断中的应用

Artif Intell Med. 2021 Jan;111:101985. doi: 10.1016/j.artmed.2020.101985. Epub 2020 Nov 8.

Efficient kNN Classification With Different Numbers of Nearest Neighbors.高效 kNN 分类与不同数量的近邻。

IEEE Trans Neural Netw Learn Syst. 2018 May;29(5):1774-1785. doi: 10.1109/TNNLS.2017.2673241. Epub 2017 Apr 12.

AVNM: A Voting based Novel Mathematical Rule for Image Classification.AVNM：一种基于投票的图像分类新数学规则。

Comput Methods Programs Biomed. 2016 Dec;137:195-201. doi: 10.1016/j.cmpb.2016.08.015. Epub 2016 Sep 26.

Random kernel k-nearest neighbors regression.随机核k近邻回归

Front Big Data. 2024 Jul 1;7:1402384. doi: 10.3389/fdata.2024.1402384. eCollection 2024.

Advanced methods for missing values imputation based on similarity learning.基于相似性学习的缺失值插补先进方法。

PeerJ Comput Sci. 2021 Jul 21;7:e619. doi: 10.7717/peerj-cs.619. eCollection 2021.

Introduction to machine learning: k-nearest neighbors.机器学习导论：k-最近邻算法。

Ann Transl Med. 2016 Jun;4(11):218. doi: 10.21037/atm.2016.03.37.

A Fast Exact k-Nearest Neighbors Algorithm for High Dimensional Search Using k-Means Clustering and Triangle Inequality.一种使用k均值聚类和三角不等式进行高维搜索的快速精确k近邻算法。

Proc Int Jt Conf Neural Netw. 2012 Feb 8;43(6):2351-2358. doi: 10.1016/j.patcog.2010.01.003.

PANENE: A Progressive Algorithm for Indexing and Querying Approximate k-Nearest Neighbors.PANENE：一种用于索引和查询近似k近邻的渐进式算法。

IEEE Trans Vis Comput Graph. 2020 Feb;26(2):1347-1360. doi: 10.1109/TVCG.2018.2869149. Epub 2018 Sep 12.

Two-point-based binary search trees for accelerating big data classification using KNN.基于两点的二叉搜索树加速 KNN 进行大数据分类。

PLoS One. 2018 Nov 26;13(11):e0207772. doi: 10.1371/journal.pone.0207772. eCollection 2018.

引用本文的文献

A Comparative Analysis of Machine-Learning Algorithms for Automated International Classification of Diseases (ICD)-10 Coding in Malaysian Death Records.马来西亚死亡记录中用于自动国际疾病分类（ICD）-10编码的机器学习算法的比较分析

Cureus. 2025 Jan 12;17(1):e77342. doi: 10.7759/cureus.77342. eCollection 2025 Jan.

Artificial intelligence in pediatric allergy research.人工智能在儿科过敏研究中的应用

Eur J Pediatr. 2024 Dec 21;184(1):98. doi: 10.1007/s00431-024-05925-5.

Using Large Language Models to Detect Depression From User-Generated Diary Text Data as a Novel Approach in Digital Mental Health Screening: Instrument Validation Study.利用大型语言模型从用户生成的日记文本数据中检测抑郁，作为数字心理健康筛查的新方法：仪器验证研究。

J Med Internet Res. 2024 Sep 18;26:e54617. doi: 10.2196/54617.

Finding fault types of BLDC motors within UAVs using machine learning techniques.使用机器学习技术找出无人机内无刷直流电机的故障类型。

Heliyon. 2024 Apr 28;10(9):e30251. doi: 10.1016/j.heliyon.2024.e30251. eCollection 2024 May 15.

Prediction of nonsentinel lymph node metastasis in breast cancer patients based on machine learning.基于机器学习的乳腺癌患者非前哨淋巴结转移预测。

World J Surg Oncol. 2023 Aug 11;21(1):244. doi: 10.1186/s12957-023-03109-3.

Bone metastasis risk and prognosis assessment models for kidney cancer based on machine learning.基于机器学习的肾癌骨转移风险和预后评估模型。

Front Public Health. 2022 Nov 17;10:1015952. doi: 10.3389/fpubh.2022.1015952. eCollection 2022.

Prediction of lymph node metastasis in patients with breast invasive micropapillary carcinoma based on machine learning and SHapley Additive exPlanations framework.基于机器学习和SHapley加性解释框架预测乳腺浸润性微乳头状癌患者的淋巴结转移

Front Oncol. 2022 Sep 15;12:981059. doi: 10.3389/fonc.2022.981059. eCollection 2022.

A personalized DVH prediction model for HDR brachytherapy in cervical cancer treatment.一种用于宫颈癌治疗中高剂量率近距离放疗的个性化剂量体积直方图预测模型。

Front Oncol. 2022 Aug 30;12:967436. doi: 10.3389/fonc.2022.967436. eCollection 2022.

A Systematic Evaluation of Supervised Machine Learning Algorithms for Cell Phenotype Classification Using Single-Cell RNA Sequencing Data.使用单细胞RNA测序数据对用于细胞表型分类的监督机器学习算法的系统评估

Front Genet. 2022 Feb 23;13:836798. doi: 10.3389/fgene.2022.836798. eCollection 2022.

Prediction of lung metastases in thyroid cancer using machine learning based on SEER database.基于 SEER 数据库的机器学习预测甲状腺癌肺转移。

Cancer Med. 2022 Jun;11(12):2503-2515. doi: 10.1002/cam4.4617. Epub 2022 Feb 22.

本文引用的文献

Multi-Objective Evolutionary Rule-Based Classification with Categorical Data.基于多目标进化规则的分类方法处理分类数据

Entropy (Basel). 2018 Sep 7;20(9):684. doi: 10.3390/e20090684.

Space Structure and Clustering of Categorical Data.空间结构与分类数据聚类。

IEEE Trans Neural Netw Learn Syst. 2016 Oct;27(10):2047-59. doi: 10.1109/TNNLS.2015.2451151. Epub 2015 Oct 2.

Missing data imputation on the 5-year survival prediction of breast cancer patients with unknown discrete values.对具有未知离散值的乳腺癌患者5年生存预测中的缺失数据进行插补。

Comput Biol Med. 2015 Apr;59:125-133. doi: 10.1016/j.compbiomed.2015.02.006. Epub 2015 Feb 16.

Missing data imputation using statistical and machine learning methods in a real breast cancer problem.在一个真实的乳腺癌问题中使用统计和机器学习方法进行缺失数据插补。

Artif Intell Med. 2010 Oct;50(2):105-15. doi: 10.1016/j.artmed.2010.05.002. Epub 2010 Jul 16.

Principles of data mining.数据挖掘原理。

Drug Saf. 2007;30(7):621-2. doi: 10.2165/00002018-200730070-00010.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

压缩神经网络：带数据压缩的K近邻算法

Compressed NN: K-Nearest Neighbors with Data Compression.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献