• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用查询驱动动态量化和分布式索引的高维相似性搜索。

High-dimensional similarity searches using query driven dynamic quantization and distributed indexing.

作者信息

Guzun Gheorghi, Canahuate Guadalupe

机构信息

Department of Computer Engineering, San Jose State University, San Jose, CA, USA.

Department of Electrical and Computer Engineering, The University of Iowa, Iowa, IA, USA.

出版信息

Distrib Parallel Databases. 2020;38:255-286. doi: 10.1007/s10619-019-07266-x. Epub 2019 Apr 11.

DOI:10.1007/s10619-019-07266-x
PMID:32863590
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7453591/
Abstract

The concept of similarity is used as the basis for many data exploration and data mining tasks. Nearest neighbor (NN) queries identify the most similar items, or in terms of distance the closest points to a query point. Similarity is traditionally characterized using a distance function between multi-dimensional feature vectors. However, when the data is high-dimensional, traditional distance functions fail to significantly distinguish between the closest and furthest points, as few dissimilar dimensions dominate the distance function. Localized similarity functions, i.e. functions that only consider dimensions close to the query, quantize each dimension independently and only compute similarity for the dimensions where the query and the points fall into the same bin. These quantizations are query-agnostic and there is potential to improve accuracy when a query-dependent quantization is used. In this work we propose a query dependent equi-depth (QED) on-the-fly quantization method to improve high-dimensional similarity searches. The quantization is done for each dimension at query time and localized scores are generated for the closest fraction of the points while a constant penalty is applied for the rest of the points. QED not only improves the quality of the distancemetric,but also improves query time performance by filtering out non relevant data. We propose a distributed indexing and query algorithm to efficiently compute QED. Our experimental results show improvements in classification accuracy as well as query performance up to one order of magnitude faster than Manhattan-based sequential scan NN queries over datasets with hundreds of dimensions. Furthermore, similarity searches with QED show linear or better scalability in relation to the number of dimensions, and the number of compute nodes.

摘要

相似性概念被用作许多数据探索和数据挖掘任务的基础。最近邻(NN)查询可识别最相似的项目,或者就距离而言,找到距离查询点最近的点。传统上,相似性是通过多维特征向量之间的距离函数来表征的。然而,当数据是高维数据时,传统距离函数无法显著区分最近点和最远点,因为很少有不同的维度主导距离函数。局部相似性函数,即仅考虑与查询接近的维度的函数,独立地对每个维度进行量化,并且仅对查询和点落入同一箱的维度计算相似性。这些量化与查询无关,并且当使用依赖于查询的量化时有可能提高准确性。在这项工作中,我们提出了一种依赖于查询的等深度(QED)实时量化方法,以改进高维相似性搜索。在查询时对每个维度进行量化,并为最接近的一部分点生成局部得分,而对其余点应用恒定惩罚。QED不仅提高了距离度量的质量,还通过过滤掉不相关数据提高了查询时间性能。我们提出了一种分布式索引和查询算法来高效地计算QED。我们的实验结果表明,在具有数百个维度的数据集上,分类准确率以及查询性能都有了提高,比基于曼哈顿距离的顺序扫描NN查询快了一个数量级。此外,使用QED的相似性搜索在维度数量和计算节点数量方面显示出线性或更好的可扩展性。

相似文献

1
High-dimensional similarity searches using query driven dynamic quantization and distributed indexing.使用查询驱动动态量化和分布式索引的高维相似性搜索。
Distrib Parallel Databases. 2020;38:255-286. doi: 10.1007/s10619-019-07266-x. Epub 2019 Apr 11.
2
Distributed query-aware quantization for high-dimensional similarity searches.用于高维相似性搜索的分布式查询感知量化
Adv Database Technol. 2018 Mar;2018:373-384. doi: 10.5441/002/edbt.2018.33.
3
Supporting Dynamic Quantization for High-Dimensional Data Analytics.支持高维数据分析的动态量化
Proc ExploreDB17 (2017). 2017 May;2017. doi: 10.1145/3077331.3077336.
4
Asymmetric Mapping Quantization for Nearest Neighbor Search.用于最近邻搜索的非对称映射量化
IEEE Trans Pattern Anal Mach Intell. 2020 Jul;42(7):1783-1790. doi: 10.1109/TPAMI.2019.2925347. Epub 2019 Jun 27.
5
Approximate nearest neighbor search by residual vector quantization.基于残差向量量化的近似最近邻搜索。
Sensors (Basel). 2010;10(12):11259-73. doi: 10.3390/s101211259. Epub 2010 Dec 8.
6
Query Optimization for Distributed Spatio-Temporal Sensing Data Processing.分布式时空传感数据处理的查询优化
Sensors (Basel). 2022 Feb 23;22(5):1748. doi: 10.3390/s22051748.
7
Updatable privacy-preserving -nearest neighbor query in location-based s-ervice.基于位置服务中可更新的隐私保护k近邻查询
Peer Peer Netw Appl. 2022;15(2):1076-1089. doi: 10.1007/s12083-021-01290-4. Epub 2022 Jan 7.
8
SymDex: increasing the efficiency of chemical fingerprint similarity searches for comparing large chemical libraries by using query set indexing.SymDex:通过查询集索引提高化学指纹相似性搜索比较大型化学库的效率。
J Chem Inf Model. 2012 Aug 27;52(8):1926-35. doi: 10.1021/ci200606t. Epub 2012 Aug 7.
9
Medical Image Retrieval via Nearest Neighbor Search on Pre-trained Image Features.通过对预训练图像特征进行最近邻搜索实现医学图像检索。
Knowl Based Syst. 2023 Oct 25;278. doi: 10.1016/j.knosys.2023.110907. Epub 2023 Aug 18.
10
Implementation and evaluation of a multivariate abstraction-based, interval-based dynamic time-warping method as a similarity measure for longitudinal medical records.基于多元抽象和区间的动态时间规整方法的实现和评估,作为一种用于纵向医疗记录的相似性度量方法。
J Biomed Inform. 2021 Nov;123:103919. doi: 10.1016/j.jbi.2021.103919. Epub 2021 Oct 8.

本文引用的文献

1
Supporting Dynamic Quantization for High-Dimensional Data Analytics.支持高维数据分析的动态量化
Proc ExploreDB17 (2017). 2017 May;2017. doi: 10.1145/3077331.3077336.
2
Distributed query-aware quantization for high-dimensional similarity searches.用于高维相似性搜索的分布式查询感知量化
Adv Database Technol. 2018 Mar;2018:373-384. doi: 10.5441/002/edbt.2018.33.
3
Searching for exotic particles in high-energy physics with deep learning.用深度学习在高能物理学中寻找奇异粒子。
Nat Commun. 2014 Jul 2;5:4308. doi: 10.1038/ncomms5308.
4
Product quantization for nearest neighbor search.基于乘积量化的最近邻搜索。
IEEE Trans Pattern Anal Mach Intell. 2011 Jan;33(1):117-28. doi: 10.1109/TPAMI.2010.57.
5
Skin segmentation using color pixel classification: analysis and comparison.基于颜色像素分类的皮肤分割:分析与比较
IEEE Trans Pattern Anal Mach Intell. 2005 Jan;27(1):148-54. doi: 10.1109/TPAMI.2005.17.