• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于高维数据和大数据集上的k近邻搜索算法的望远镜索引

Telescope indexing for k-nearest neighbor search algorithms over high dimensional data & large data sets.

作者信息

K R Madhavan, Kurban Hasan, Kulekci Oguzhan M, Dalkilic Mehmet M

机构信息

Computer Science Department, Indiana University, Bloomington, Indiana, IN, USA.

College of Science and Engineering, Hamad Bin Khalifa University, Doha, Qatar.

出版信息

Sci Rep. 2025 Jul 9;15(1):24788. doi: 10.1038/s41598-025-09856-5.

DOI:10.1038/s41598-025-09856-5
PMID:40634470
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12241374/
Abstract

When k-Nearest-Neighbors ([Formula: see text]-NN) was conceived more than 70 years ago, computation, as we use it now, would be hardly recognizable. Since then, technology has improved by orders of magnitude, including unprecedented connectivity. However, [Formula: see text]-NN has remained virtually unchanged, exposing its shortcomings for today's needs: becoming overwhelmed when presented with large, high-dimensional data. Although space partitioning data structures, especially k-d trees and ball-trees, have improved performance in larger data, they remain inadequate when data is also high-dimensional. Experiments confirm that space partitioning becomes ineffective in high-dimensional data because most of the search space is explored needlessly. Our strategy is to partition the data into small groups of points similarly distanced from a reference point in a B+ tree data structure and use this data structure to limit the search space of a [Formula: see text]-NN query. Further, we establish that the limited search space chosen by the B+ tree structure can be effectively explored by any indexing techniques applicable to the entire data. We then present our algorithm [Formula: see text]-NN with partitioning (ti[Formula: see text]-NN), including computational analysis and experiments. Our detailed evaluation demonstrates significant speedup achieved by ti[Formula: see text]-NN over the naive, [Formula: see text]-d tree, [Formula: see text]-tree based [Formula: see text]-NN and other state-of-the-art approximate [Formula: see text]-NN search approaches in high dimensional data.

摘要

70多年前提出k近邻算法(k-Nearest-Neighbors,k-NN)时,我们现在所使用的计算方式几乎无法想象。从那时起,技术已经有了数量级的提升,包括前所未有的连接性。然而,k-NN算法几乎没有什么变化,暴露出其在满足当今需求方面的不足:面对大规模的高维数据时会不堪重负。尽管空间划分数据结构,特别是k-d树和球树,在处理更大规模数据时提高了性能,但在数据也是高维的情况下仍然不够。实验证实,在高维数据中空间划分变得无效,因为大部分搜索空间都被不必要地探索了。我们的策略是在B+树数据结构中将数据划分为与参考点距离相近的小数据点组,并使用这种数据结构来限制k-NN查询的搜索空间。此外,我们确定B+树结构选择的有限搜索空间可以通过适用于整个数据的任何索引技术有效地探索。然后我们提出了带划分的k-NN算法(ti k-NN),包括计算分析和实验。我们的详细评估表明,在高维数据中,ti k-NN相对于朴素的、基于k-d树、球树的k-NN以及其他最先进的近似k-NN搜索方法实现了显著的加速。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c07e/12241374/a6cbda82f082/41598_2025_9856_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c07e/12241374/5eeebdd94271/41598_2025_9856_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c07e/12241374/18080677fcbe/41598_2025_9856_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c07e/12241374/14673e378ebc/41598_2025_9856_Figa_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c07e/12241374/8c8b5a0145c1/41598_2025_9856_Figb_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c07e/12241374/49a039007410/41598_2025_9856_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c07e/12241374/681316a74604/41598_2025_9856_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c07e/12241374/b8c7948294cd/41598_2025_9856_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c07e/12241374/0d8da3df38de/41598_2025_9856_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c07e/12241374/a37457f729dd/41598_2025_9856_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c07e/12241374/a6cbda82f082/41598_2025_9856_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c07e/12241374/5eeebdd94271/41598_2025_9856_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c07e/12241374/18080677fcbe/41598_2025_9856_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c07e/12241374/14673e378ebc/41598_2025_9856_Figa_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c07e/12241374/8c8b5a0145c1/41598_2025_9856_Figb_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c07e/12241374/49a039007410/41598_2025_9856_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c07e/12241374/681316a74604/41598_2025_9856_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c07e/12241374/b8c7948294cd/41598_2025_9856_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c07e/12241374/0d8da3df38de/41598_2025_9856_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c07e/12241374/a37457f729dd/41598_2025_9856_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c07e/12241374/a6cbda82f082/41598_2025_9856_Fig8_HTML.jpg

相似文献

1
Telescope indexing for k-nearest neighbor search algorithms over high dimensional data & large data sets.用于高维数据和大数据集上的k近邻搜索算法的望远镜索引
Sci Rep. 2025 Jul 9;15(1):24788. doi: 10.1038/s41598-025-09856-5.
2
A Novel Design of a Portable Birdcage via Meander Line Antenna (MLA) to Lower Beta Amyloid (Aβ) in Alzheimer's Disease.一种通过曲折线天线(MLA)设计的便携式鸟笼,用于降低阿尔茨海默病中的β淀粉样蛋白(Aβ)。
IEEE J Transl Eng Health Med. 2025 Apr 10;13:158-173. doi: 10.1109/JTEHM.2025.3559693. eCollection 2025.
3
Sexual Harassment and Prevention Training性骚扰与预防培训
4
Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19.在基层医疗机构或医院门诊环境中,如果患者出现以下症状和体征,可判断其是否患有 COVID-19。
Cochrane Database Syst Rev. 2022 May 20;5(5):CD013665. doi: 10.1002/14651858.CD013665.pub3.
5
Antidepressants for pain management in adults with chronic pain: a network meta-analysis.抗抑郁药治疗成人慢性疼痛的疼痛管理:一项网络荟萃分析。
Health Technol Assess. 2024 Oct;28(62):1-155. doi: 10.3310/MKRT2948.
6
Diagnostic test accuracy and cost-effectiveness of tests for codeletion of chromosomal arms 1p and 19q in people with glioma.染色体臂 1p 和 19q 缺失的检测在胶质瘤患者中的诊断准确性和成本效益。
Cochrane Database Syst Rev. 2022 Mar 2;3(3):CD013387. doi: 10.1002/14651858.CD013387.pub2.
7
The Black Book of Psychotropic Dosing and Monitoring.《精神药物剂量与监测黑皮书》
Psychopharmacol Bull. 2024 Jul 8;54(3):8-59.
8
Comparison of cellulose, modified cellulose and synthetic membranes in the haemodialysis of patients with end-stage renal disease.纤维素、改性纤维素和合成膜在终末期肾病患者血液透析中的比较。
Cochrane Database Syst Rev. 2001(3):CD003234. doi: 10.1002/14651858.CD003234.
9
Use of endoanal ultrasound for reducing the risk of complications related to anal sphincter injury after vaginal birth.使用经肛门超声降低阴道分娩后肛门括约肌损伤相关并发症的风险。
Cochrane Database Syst Rev. 2015 Oct 29;2015(10):CD010826. doi: 10.1002/14651858.CD010826.pub2.
10
A rapid and systematic review of the clinical effectiveness and cost-effectiveness of paclitaxel, docetaxel, gemcitabine and vinorelbine in non-small-cell lung cancer.对紫杉醇、多西他赛、吉西他滨和长春瑞滨在非小细胞肺癌中的临床疗效和成本效益进行的快速系统评价。
Health Technol Assess. 2001;5(32):1-195. doi: 10.3310/hta5320.

本文引用的文献

1
Array programming with NumPy.使用 NumPy 进行数组编程。
Nature. 2020 Sep;585(7825):357-362. doi: 10.1038/s41586-020-2649-2. Epub 2020 Sep 16.
2
Efficient and Robust Approximate Nearest Neighbor Search Using Hierarchical Navigable Small World Graphs.使用分层可导航小世界图进行高效且鲁棒的近似最近邻搜索
IEEE Trans Pattern Anal Mach Intell. 2020 Apr;42(4):824-836. doi: 10.1109/TPAMI.2018.2889473. Epub 2018 Dec 28.
3
Scalable Nearest Neighbor Algorithms for High Dimensional Data.高维数据的可扩展最近邻算法。
IEEE Trans Pattern Anal Mach Intell. 2014 Nov;36(11):2227-40. doi: 10.1109/TPAMI.2014.2321376.
4
A Fast Exact k-Nearest Neighbors Algorithm for High Dimensional Search Using k-Means Clustering and Triangle Inequality.一种使用k均值聚类和三角不等式进行高维搜索的快速精确k近邻算法。
Proc Int Jt Conf Neural Netw. 2012 Feb 8;43(6):2351-2358. doi: 10.1016/j.patcog.2010.01.003.