College of Engineering, Mathematics and Physical Sciences, University of Exeter, EX4 4QF, UK
Neural Comput. 2014 Mar;26(3):497-522. doi: 10.1162/NECO_a_00556. Epub 2013 Dec 9.
Learning an appropriate (dis)similarity function from the available data is a central problem in machine learning, since the success of many machine learning algorithms critically depends on the choice of a similarity function to compare examples. Despite many approaches to similarity metric learning that have been proposed, there has been little theoretical study on the links between similarity metric learning and the classification performance of the resulting classifier. In this letter, we propose a regularized similarity learning formulation associated with general matrix norms and establish their generalization bounds. We show that the generalization error of the resulting linear classifier can be bounded by the derived generalization bound of similarity learning. This shows that a good generalization of the learned similarity function guarantees a good classification of the resulting linear classifier. Our results extend and improve those obtained by Bellet, Habrard, and Sebban (2012). Due to the techniques dependent on the notion of uniform stability (Bousquet & Elisseeff, 2002), the bound obtained there holds true only for the Frobenius matrix-norm regularization. Our techniques using the Rademacher complexity (Bartlett & Mendelson, 2002) and its related Khinchin-type inequality enable us to establish bounds for regularized similarity learning formulations associated with general matrix norms, including sparse L1-norm and mixed (2,1)-norm.
从可用数据中学习适当的(不)相似性函数是机器学习中的一个核心问题,因为许多机器学习算法的成功都取决于相似性函数的选择,以比较示例。尽管已经提出了许多相似性度量学习方法,但在相似性度量学习与所得分类器的分类性能之间的联系方面,几乎没有进行理论研究。在这封信中,我们提出了与一般矩阵范数相关的正则化相似性学习公式,并建立了它们的泛化界。我们表明,所得线性分类器的泛化误差可以由相似性学习的导出泛化界来限定。这表明学习的相似性函数的良好泛化保证了所得线性分类器的良好分类。我们的结果扩展和改进了 Bellet、Habrard 和 Sebban(2012 年)的结果。由于依赖于一致稳定性概念的技术(Bousquet 和 Elisseeff,2002 年),那里获得的界仅适用于 Frobenius 矩阵范数正则化。我们使用 Rademacher 复杂度(Bartlett 和 Mendelson,2002 年)及其相关的 Khinchin 型不等式的技术,使我们能够为与一般矩阵范数相关的正则化相似性学习公式建立界,包括稀疏 L1 范数和混合(2,1)-范数。