Liu Xinwang, Zhu Xinzhong, Li Miaomiao, Wang Lei, Zhu En, Liu Tongliang, Kloft Marius, Shen Dinggang, Yin Jianping, Gao Wen
IEEE Trans Pattern Anal Mach Intell. 2020 May;42(5):1191-1204. doi: 10.1109/TPAMI.2019.2892416. Epub 2019 Jan 14.
Multiple kernel clustering (MKC) algorithms optimally combine a group of pre-specified base kernel matrices to improve clustering performance. However, existing MKC algorithms cannot efficiently address the situation where some rows and columns of base kernel matrices are absent. This paper proposes two simple yet effective algorithms to address this issue. Different from existing approaches where incomplete kernel matrices are first imputed and a standard MKC algorithm is applied to the imputed kernel matrices, our first algorithm integrates imputation and clustering into a unified learning procedure. Specifically, we perform multiple kernel clustering directly with the presence of incomplete kernel matrices, which are treated as auxiliary variables to be jointly optimized. Our algorithm does not require that there be at least one complete base kernel matrix over all the samples. Also, it adaptively imputes incomplete kernel matrices and combines them to best serve clustering. Moreover, we further improve this algorithm by encouraging these incomplete kernel matrices to mutually complete each other. The three-step iterative algorithm is designed to solve the resultant optimization problems. After that, we theoretically study the generalization bound of the proposed algorithms. Extensive experiments are conducted on 13 benchmark data sets to compare the proposed algorithms with existing imputation-based methods. Our algorithms consistently achieve superior performance and the improvement becomes more significant with increasing missing ratio, verifying the effectiveness and advantages of the proposed joint imputation and clustering.
多核聚类(MKC)算法通过最优地组合一组预先指定的基核矩阵来提高聚类性能。然而,现有的MKC算法无法有效处理基核矩阵的某些行和列缺失的情况。本文提出了两种简单而有效的算法来解决这个问题。与现有方法不同,现有方法是先对不完整的核矩阵进行插补,然后将标准的MKC算法应用于插补后的核矩阵,而我们的第一种算法将插补和聚类集成到一个统一的学习过程中。具体来说,我们在存在不完整核矩阵的情况下直接进行多核聚类,将这些不完整核矩阵视为需要联合优化的辅助变量。我们的算法不要求在所有样本上至少有一个完整的基核矩阵。此外,它能自适应地插补不完整核矩阵并将它们组合起来以最好地服务于聚类。而且,我们通过鼓励这些不完整核矩阵相互补充来进一步改进该算法。设计了三步迭代算法来解决由此产生的优化问题。之后,我们从理论上研究了所提出算法的泛化界。在13个基准数据集上进行了广泛的实验,将所提出的算法与现有的基于插补的方法进行比较。我们的算法始终表现出卓越的性能,并且随着缺失率的增加,性能提升变得更加显著,验证了所提出的联合插补和聚类的有效性和优势。