关于模式识别中的逐分量差异度量和度量性质

On component-wise dissimilarity measures and metric properties in pattern recognition.

作者信息

De Santis Enrico, Martino Alessio, Rizzi Antonello

机构信息

Department of Information Engineering, Electronics and Telecommunications, University of Roma "La Sapienza", Rome, Italy.

Department of Business and Management, LUISS University, Rome, Italy.

出版信息

PeerJ Comput Sci. 2022 Oct 10;8:e1106. doi: 10.7717/peerj-cs.1106. eCollection 2022.

DOI:10.7717/peerj-cs.1106

PMID:36262128

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9575871/

Abstract

In many real-world applications concerning pattern recognition techniques, it is of utmost importance the automatic learning of the most appropriate dissimilarity measure to be used in object comparison. Real-world objects are often complex entities and need a specific representation grounded on a composition of different heterogeneous features, leading to a non-metric starting space where Machine Learning algorithms operate. However, in the so-called unconventional spaces a family of dissimilarity measures can be still exploited, that is, the set of component-wise dissimilarity measures, in which each component is treated with a specific sub-dissimilarity that depends on the nature of the data at hand. These dissimilarities are likely to be non-Euclidean, hence the underlying dissimilarity matrix is not isometrically embeddable in a standard Euclidean space because it may not be structurally rich enough. On the other hand, in many metric learning problems, a component-wise dissimilarity measure can be defined as a weighted linear convex combination and weights can be suitably learned. This article, after introducing some hints on the relation between distances and the metric learning paradigm, provides a discussion along with some experiments on how weights, intended as mathematical operators, interact with the Euclidean behavior of dissimilarity matrices.

摘要

在许多涉及模式识别技术的实际应用中，自动学习用于对象比较的最合适的差异度量至关重要。现实世界中的对象通常是复杂的实体，需要基于不同异构特征的组合进行特定表示，从而导致机器学习算法运行的非度量起始空间。然而，在所谓的非常规空间中，仍然可以利用一类差异度量，即逐分量差异度量集，其中每个分量都用取决于手头数据性质的特定子差异来处理。这些差异很可能是非欧几里得的，因此基础差异矩阵不能等距嵌入到标准欧几里得空间中，因为它的结构可能不够丰富。另一方面，在许多度量学习问题中，逐分量差异度量可以定义为加权线性凸组合，并且权重可以适当地学习。本文在介绍了一些关于距离与度量学习范式之间关系的提示后，提供了关于权重（作为数学算子）如何与差异矩阵的欧几里得行为相互作用的讨论及一些实验。