Suppr超能文献

多标签分类的典范相关分析:最小二乘法公式、扩展及分析。

Canonical correlation analysis for multilabel classification: a least-squares formulation, extensions, and analysis.

机构信息

Department of Computer Science and the Center for Evolutionary Medicine and Informatics (CEMI) of The Biodesign Institute, Arizona State University, Tempe, AZ 85287, USA.

出版信息

IEEE Trans Pattern Anal Mach Intell. 2011 Jan;33(1):194-200. doi: 10.1109/TPAMI.2010.160.

Abstract

Canonical Correlation Analysis (CCA) is a well-known technique for finding the correlations between two sets of multidimensional variables. It projects both sets of variables onto a lower-dimensional space in which they are maximally correlated. CCA is commonly applied for supervised dimensionality reduction in which the two sets of variables are derived from the data and the class labels, respectively. It is well-known that CCA can be formulated as a least-squares problem in the binary class case. However, the extension to the more general setting remains unclear. In this paper, we show that under a mild condition which tends to hold for high-dimensional data, CCA in the multilabel case can be formulated as a least-squares problem. Based on this equivalence relationship, efficient algorithms for solving least-squares problems can be applied to scale CCA to very large data sets. In addition, we propose several CCA extensions, including the sparse CCA formulation based on the 1-norm regularization. We further extend the least-squares formulation to partial least squares. In addition, we show that the CCA projection for one set of variables is independent of the regularization on the other set of multidimensional variables, providing new insights on the effect of regularization on CCA. We have conducted experiments using benchmark data sets. Experiments on multilabel data sets confirm the established equivalence relationships. Results also demonstrate the effectiveness and efficiency of the proposed CCA extensions.

摘要

典型相关分析(CCA)是一种用于寻找两组多维变量之间相关性的知名技术。它将两组变量投影到一个低维空间中,在这个空间中它们的相关性最大。CCA 通常应用于有监督降维,其中两组变量分别来自数据和类别标签。众所周知,CCA 可以在二分类情况下被公式化为最小二乘问题。然而,这种扩展到更一般的情况仍然不清楚。在本文中,我们证明了在一个倾向于在高维数据中成立的温和条件下,多标签情况下的 CCA 可以被公式化为最小二乘问题。基于这种等价关系,可以应用解决最小二乘问题的高效算法将 CCA 扩展到非常大的数据集。此外,我们提出了几种 CCA 扩展,包括基于 1-范数正则化的稀疏 CCA 公式。我们进一步将最小二乘公式扩展到偏最小二乘。此外,我们表明一组变量的 CCA 投影与另一组多维变量的正则化无关,这为正则化对 CCA 的影响提供了新的见解。我们使用基准数据集进行了实验。多标签数据集上的实验证实了已建立的等价关系。结果还表明了所提出的 CCA 扩展的有效性和效率。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验