Chao Guoqing, Sun Shiliang, Bi Jinbo
School of Computer Science and Technology, Harbin Institute of Technology, Weihai 264209, PR China.
School of Computer Science and Technology, East China Normal University, Shanghai, Shanghai 200062 China.
IEEE Trans Artif Intell. 2021 Apr;2(2):146-168. doi: 10.1109/tai.2021.3065894. Epub 2021 Apr 5.
Clustering is a machine learning paradigm of dividing sample subjects into a number of groups such that subjects in the same groups are more similar to those in other groups. With advances in information acquisition technologies, samples can frequently be viewed from different angles or in different modalities, generating multi-view data. Multi-view clustering, that clusters subjects into subgroups using multi-view data, has attracted more and more attentions. Although MVC methods have been developed rapidly, there has not been enough survey to summarize and analyze the current progress. Therefore, we propose a novel taxonomy of the MVC approaches. Similar to other machine learning methods, we categorize them into generative and discriminative classes. In discriminative class, based on the way of view integration, we split it further into five groups: Common Eigenvector Matrix, Common Coefficient Matrix, Common Indicator Matrix, Direct Combination and Combination After Projection. Furthermore, we relate MVC to other topics: multi-view representation, ensemble clustering, multi-task clustering, multi-view supervised and semi-supervised learning. Several representative real-world applications are elaborated for practitioners. Some benchmark multi-view datasets are introduced and representative MVC algorithms from each group are empirically evaluated to analyze how they perform on benchmark datasets. To promote future development of MVC approaches, we point out several open problems that may require further investigation and thorough examination.
聚类是一种机器学习范式,即将样本主体划分为若干组,使得同一组中的主体与其他组中的主体更相似。随着信息获取技术的进步,样本常常可以从不同角度或采用不同模态进行观察,从而生成多视图数据。多视图聚类利用多视图数据将主体聚类为子组,已经引起了越来越多的关注。尽管多视图聚类方法发展迅速,但尚未有足够的综述来总结和分析当前的进展。因此,我们提出了一种新颖的多视图聚类方法分类法。与其他机器学习方法类似,我们将它们分为生成式和判别式两类。在判别式类别中,基于视图整合的方式,我们将其进一步细分为五组:公共特征向量矩阵、公共系数矩阵、公共指示矩阵、直接组合和投影后组合。此外,我们将多视图聚类与其他主题联系起来:多视图表示、集成聚类、多任务聚类、多视图监督和半监督学习。为从业者详细阐述了几个具有代表性的实际应用。介绍了一些基准多视图数据集,并对每组中的代表性多视图聚类算法进行了实证评估,以分析它们在基准数据集上的表现。为了推动多视图聚类方法的未来发展,我们指出了几个可能需要进一步研究和深入探讨的开放问题。