Suppr超能文献

通过信息论生成模型统一完整和不完整的多视图聚类

Unifying complete and incomplete multi-view clustering through an information-theoretic generative model.

作者信息

Zheng Yanghang, Zhou Guoxu, Huang Haonan, Luo Xintao, Huang Zhenhao, Zhao Qibin

机构信息

School of Automation, Guangdong University of Technology, Guangzhou, 510006, China; Key Laboratory of Intelligent Information Processing and System Integration of IoT, Ministry of Education, Guangzhou, 510006, China.

School of Automation, Guangdong University of Technology, Guangzhou, 510006, China; Key Laboratory of Intelligent Detection and The Internet of Things in Manufacturing, Ministry of Education, Guangzhou, 510006, China.

出版信息

Neural Netw. 2025 Feb;182:106901. doi: 10.1016/j.neunet.2024.106901. Epub 2024 Nov 22.

Abstract

Recently, Incomplete Multi-View Clustering (IMVC) has become a rapidly growing research topic, driven by the prevalent issue of incomplete data in real-world applications. Although many approaches have been proposed to address this challenge, most methods did not provide a clear explanation of the learning process for recovery. Moreover, most of them only considered the inter-view relationships, without taking into account the relationships between samples. The influence of irrelevant information is usually ignored, which has prevented them from achieving optimal performance. To tackle the aforementioned issues, we aim at unifying compLete and incOmplete multi-view clusterinG through an Information-theoretiC generative model (LOGIC). Specifically, we have defined three principles based on information theory: comprehensiveness, consensus, and compressibility. We first explain that the essence of learning to recover missing views is to maximize the mutual information between the common representation and the data from each view. Secondly, we leverage the consensus principle to maximize the mutual information between view distributions to uncover the associations between different samples. Finally, guided by the principle of compressibility, we remove as much task-irrelevant information as possible to ensure that the common representation effectively extracts semantic information. Furthermore, it can serve as a plug-and-play missing-data recovery module for multi-view clustering models. Through extensive empirical studies, we have demonstrated the effectiveness of our approach in generating missing views. In clustering tasks, our method consistently outperforms state-of-the-art (SOTA) techniques in terms of accuracy, normalized mutual information and purity, showcasing its superiority in both recovery and clustering performance.

摘要

最近,受现实世界应用中普遍存在的不完整数据问题驱动,不完全多视图聚类(IMVC)已成为一个快速发展的研究课题。尽管已经提出了许多方法来应对这一挑战,但大多数方法都没有对恢复的学习过程给出清晰的解释。此外,它们中的大多数只考虑了视图间的关系,而没有考虑样本之间的关系。通常会忽略无关信息的影响,这使得它们无法实现最优性能。为了解决上述问题,我们旨在通过一个信息论生成模型(LOGIC)将完整和不完整的多视图聚类统一起来。具体来说,我们基于信息论定义了三个原则:全面性、一致性和可压缩性。我们首先解释了学习恢复缺失视图的本质是最大化公共表示与来自每个视图的数据之间的互信息。其次,我们利用一致性原则最大化视图分布之间的互信息,以揭示不同样本之间的关联。最后,在可压缩性原则的指导下,我们尽可能去除与任务无关的信息,以确保公共表示有效地提取语义信息。此外,它可以作为多视图聚类模型的即插即用缺失数据恢复模块。通过广泛的实证研究,我们证明了我们的方法在生成缺失视图方面的有效性。在聚类任务中,我们的方法在准确性、归一化互信息和纯度方面始终优于现有技术(SOTA),展示了其在恢复和聚类性能方面的优越性。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验