Lee Kiyoung, Kim Dae-Won, Lee Kwang H, Lee Doheon
IEEE Trans Neural Netw. 2007 Jan;18(1):284-9. doi: 10.1109/TNN.2006.884673.
The purpose of data description is to give a compact description of the target data that represents most of its characteristics. In a support vector data description (SVDD), the compact description of target data is given in a hyperspherical model, which is determined by a small portion of data called support vectors. Despite the usefulness of the conventional SVDD, however, it may not identify the optimal solution of target description especially when the support vectors do not have the overall characteristics of the target data. To address the issue in SVDD methodology, we propose a new SVDD by introducing new distance measurements based on the notion of a relative density degree for each data point in order to reflect the distribution of a given data set. Moreover, for a real application, we extend the proposed method for the protein localization prediction problem which is a multiclass and multilabel problem. Experiments with various real data sets show promising results.
数据描述的目的是对目标数据进行紧凑描述,以呈现其大部分特征。在支持向量数据描述(SVDD)中,目标数据的紧凑描述是在超球面模型中给出的,该模型由一小部分称为支持向量的数据确定。然而,尽管传统的SVDD很有用,但它可能无法识别目标描述的最优解,尤其是当支持向量不具备目标数据的整体特征时。为了解决SVDD方法中的这个问题,我们通过引入基于每个数据点相对密度度概念的新距离度量来提出一种新的SVDD,以便反映给定数据集的分布。此外,对于实际应用,我们将所提出的方法扩展到蛋白质定位预测问题,这是一个多类和多标签问题。对各种真实数据集进行的实验显示出了有前景的结果。