• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

通过邻域计数确定最近邻域。

Nearest neighbors by neighborhood counting.

作者信息

Wang Hui

机构信息

School of Computing and Mathematics, Faculty of Engineering, University of Ulster at Jordanstown, BT37 OQB Northern Ireland, UK.

出版信息

IEEE Trans Pattern Anal Mach Intell. 2006 Jun;28(6):942-53. doi: 10.1109/TPAMI.2006.126.

DOI:10.1109/TPAMI.2006.126
PMID:16724588
Abstract

Finding nearest neighbors is a general idea that underlies many artificial intelligence tasks, including machine learning, data mining, natural language understanding, and information retrieval. This idea is explicitly used in the k-nearest neighbors algorithm (kNN), a popular classification method. In this paper, this idea is adopted in the development of a general methodology, neighborhood counting, for devising similarity functions. We turn our focus from neighbors to neighborhoods, a region in the data space covering the data point in question. To measure the similarity between two data points, we consider all neighborhoods that cover both data points. We propose to use the number of such neighborhoods as a measure of similarity. Neighborhood can be defined for different types of data in different ways. Here, we consider one definition of neighborhood for multivariate data and derive a formula for such similarity, called neighborhood counting measure or NCM. NCM was tested experimentally in the framework of kNN. Experiments show that NCM is generally comparable to VDM and its variants, the state-of-the-art distance functions for multivariate data, and, at the same time, is consistently better for relatively large k values. Additionally, NCM consistently outperforms HEOM (a mixture of Euclidean and Hamming distances), the "standard" and most widely used distance function for multivariate data. NCM has a computational complexity in the same order as the standard Euclidean distance function and NCM is task independent and works for numerical and categorical data in a conceptually uniform way. The neighborhood counting methodology is proven sound for multivariate data experimentally. We hope it will work for other types of data.

摘要

寻找最近邻是许多人工智能任务的基础思想,包括机器学习、数据挖掘、自然语言理解和信息检索。这种思想在k近邻算法(kNN)中得到了明确应用,kNN是一种流行的分类方法。在本文中,这种思想被用于开发一种通用方法——邻域计数,以设计相似性函数。我们将关注点从邻居转移到邻域,邻域是数据空间中覆盖所讨论数据点的一个区域。为了测量两个数据点之间的相似性,我们考虑所有覆盖这两个数据点的邻域。我们建议使用此类邻域的数量作为相似性的度量。邻域可以针对不同类型的数据以不同方式定义。在此,我们考虑多元数据的一种邻域定义,并推导出这种相似性的公式,称为邻域计数度量(NCM)。NCM在kNN框架下进行了实验测试。实验表明,NCM通常与VDM及其变体(多元数据的当前最先进距离函数)相当,同时,对于相对较大的k值,NCM始终表现更好。此外,NCM始终优于HEOM(欧几里得距离和汉明距离的混合),HEOM是多元数据“标准”且使用最广泛的距离函数。NCM的计算复杂度与标准欧几里得距离函数处于同一量级,并且NCM与任务无关,以概念上统一的方式适用于数值数据和分类数据。邻域计数方法经实验证明对多元数据是合理的。我们希望它也适用于其他类型的数据。

相似文献

1
Nearest neighbors by neighborhood counting.通过邻域计数确定最近邻域。
IEEE Trans Pattern Anal Mach Intell. 2006 Jun;28(6):942-53. doi: 10.1109/TPAMI.2006.126.
2
On visualization and aggregation of nearest neighbor classifiers.关于最近邻分类器的可视化与聚合
IEEE Trans Pattern Anal Mach Intell. 2005 Oct;27(10):1592-602. doi: 10.1109/TPAMI.2005.204.
3
Learning weighted metrics to minimize nearest-neighbor classification error.学习加权度量以最小化最近邻分类误差。
IEEE Trans Pattern Anal Mach Intell. 2006 Jul;28(7):1100-10. doi: 10.1109/TPAMI.2006.145.
4
Fast agglomerative clustering using a k-nearest neighbor graph.使用k近邻图的快速凝聚聚类
IEEE Trans Pattern Anal Mach Intell. 2006 Nov;28(11):1875-81. doi: 10.1109/TPAMI.2006.227.
5
Building k edge-disjoint spanning trees of minimum total length for isometric data embedding.构建用于等距数据嵌入的具有最小总长度的k条边不相交生成树。
IEEE Trans Pattern Anal Mach Intell. 2005 Oct;27(10):1680-3. doi: 10.1109/TPAMI.2005.192.
6
On the impact of dissimilarity measure in k-modes clustering algorithm.关于差异度量在k-模式聚类算法中的影响。
IEEE Trans Pattern Anal Mach Intell. 2007 Mar;29(3):503-7. doi: 10.1109/TPAMI.2007.53.
7
Effective proximity retrieval by ordering permutations.通过排列排序实现有效的近似检索。
IEEE Trans Pattern Anal Mach Intell. 2008 Sep;30(9):1647-58. doi: 10.1109/TPAMI.2007.70815.
8
Multi-stencils fast marching methods: a highly accurate solution to the eikonal equation on cartesian domains.多模板快速行进法:笛卡尔域上求解程函方程的高精度方法。
IEEE Trans Pattern Anal Mach Intell. 2007 Sep;29(9):1563-74. doi: 10.1109/TPAMI.2007.1154.
9
Metric learning for text documents.文本文献的度量学习
IEEE Trans Pattern Anal Mach Intell. 2006 Apr;28(4):497-508. doi: 10.1109/TPAMI.2006.77.
10
Iterative RELIEF for feature weighting: algorithms, theories, and applications.用于特征加权的迭代RELIEF:算法、理论与应用
IEEE Trans Pattern Anal Mach Intell. 2007 Jun;29(6):1035-51. doi: 10.1109/TPAMI.2007.1093.

引用本文的文献

1
AI-Powered Mining of Highly Customized and Superior ESIPT-Based Fluorescent Probes.基于激发态分子内质子转移的高度定制化优质荧光探针的人工智能挖掘
Adv Sci (Weinh). 2024 Sep;11(35):e2405596. doi: 10.1002/advs.202405596. Epub 2024 Jul 17.
2
Method of regulatory network that can explore protein regulations for disease classification.用于疾病分类的调控网络方法,可以探索蛋白质调控。
Artif Intell Med. 2010 Feb-Mar;48(2-3):119-27. doi: 10.1016/j.artmed.2009.07.011. Epub 2009 Dec 3.