Boongoen T, Iam-On N
IEEE Trans Syst Man Cybern B Cybern. 2011 Dec;41(6):1705-14. doi: 10.1109/TSMCB.2011.2160341. Epub 2011 Jul 28.
The measure of data reliability has recently proven useful for a number of data analysis tasks. This paper extends the underlying metric to a new problem of soft subspace clustering. The concept of subspace clustering has been increasingly recognized as an effective alternative to conventional algorithms (which search for clusters without differentiating the significance of different data attributes). While a large number of crisp subspace approaches have been proposed, only a handful of soft counterparts are developed with the common goal of acquiring the optimal cluster-specific dimension weights. Most soft subspace clustering methods work based on the exploitation of k-means and greatly rely on the iteratively disclosed cluster centers for the determination of local weights. Unlike such wrapper techniques, this paper presents a filter approach which is efficient and generally applicable to different types of clustering. Systematical experimental evaluations have been carried out over a collection of published gene expression data sets. The results demonstrate that the reliability-based methods generally enhance their corresponding baseline models and outperform several well-known subspace clustering algorithms.
数据可靠性度量最近已被证明对许多数据分析任务有用。本文将基础度量扩展到软子空间聚类的新问题。子空间聚类的概念已越来越被视为传统算法(在不区分不同数据属性重要性的情况下搜索聚类)的有效替代方法。虽然已经提出了大量清晰子空间方法,但只有少数软子空间方法是为获取最优的特定聚类维度权重这一共同目标而开发的。大多数软子空间聚类方法基于对k均值的利用来工作,并且在很大程度上依赖于迭代揭示的聚类中心来确定局部权重。与这种包装技术不同,本文提出了一种过滤方法,该方法高效且通常适用于不同类型的聚类。已对一组已发表的基因表达数据集进行了系统的实验评估。结果表明,基于可靠性的方法通常会增强其相应的基线模型,并优于几种著名的子空间聚类算法。