National Centre for Text Mining (NaCTeM), School of Computer Science, University of Manchester, Manchester, United Kingdom.
IEEE Trans Pattern Anal Mach Intell. 2012 Nov;34(11):2216-32. doi: 10.1109/tpami.2012.20.
This paper is about supervised and semi-supervised dimensionality reduction (DR) by generating spectral embeddings from multi-output data based on the pairwise proximity information. Two flexible and generic frameworks are proposed to achieve supervised DR (SDR) for multilabel classification. One is able to extend any existing single-label SDR to multilabel via sample duplication, referred to as MESD. The other is a multilabel design framework that tackles the SDR problem by computing weight (proximity) matrices based on simultaneous feature and label information, referred to as MOPE, as a generalization of many current techniques. A diverse set of different schemes for label-based proximity calculation, as well as a mechanism for combining label-based and feature-based weight information by considering information importance and prioritization, are proposed for MOPE. Additionally, we summarize many current spectral methods for unsupervised DR (UDR), single/multilabel SDR, and semi-supervised DR (SSDR) and express them under a common template representation as a general guide to researchers in the field. We also propose a general framework for achieving SSDR by combining existing SDR and UDR models, and also a procedure of reducing the computational cost via learning with a target set of relation features. The effectiveness of our proposed methodologies is demonstrated with experiments with document collections for multilabel text categorization from the natural language processing domain.
本文介绍了一种基于多输出数据的成对相似度信息,从谱嵌入的角度进行监督和半监督降维(DR)的方法。提出了两种灵活且通用的框架,以实现多标签分类的监督 DR(SDR)。一种方法能够通过样本复制将任何现有的单标签 SDR 扩展到多标签,称为 MESD。另一种是一种多标签设计框架,通过基于同时的特征和标签信息计算权重(相似度)矩阵来解决 SDR 问题,称为 MOPE,是许多当前技术的推广。针对 MOPE,提出了基于标签的相似度计算的多种不同方案,以及一种通过考虑信息重要性和优先级来结合基于标签和基于特征的权重信息的机制。此外,我们总结了许多用于无监督 DR(UDR)、单/多标签 SDR 和半监督 DR(SSDR)的当前谱方法,并将它们表示为一个通用模板表示,为该领域的研究人员提供了一个总体指导。我们还提出了一种通过结合现有的 SDR 和 UDR 模型来实现 SSDR 的通用框架,以及一种通过学习目标关系特征集来降低计算成本的过程。我们的方法在多标签文本分类的自然语言处理领域的文档集合上的实验中得到了验证。