AlZuhair Mona Suliman, Ben Ismail Mohamed Maher, Bchir Ouiem
Department of Computer Science, College of Computer and Information Sciences, King Saud University, Riyadh 11543, Saudi Arabia.
Sensors (Basel). 2025 Apr 21;25(8):2622. doi: 10.3390/s25082622.
Semi-supervised clustering can be viewed as a clustering paradigm that exploits both labeled and unlabeled data to steer learning accurate data clusters and avoid local minimum solutions. Nonetheless, the attempts to refine existing semi-supervised clustering methods are relatively limited when compared to the advancements witnessed in the current benchmark methods in fully unsupervised clustering. This research introduces a novel semi-supervised method for deep clustering that leverages deep neural networks and fuzzy memberships to better capture the data partitions. In particular, the proposed Dual-Constraint-based Semi-Supervised Deep Clustering (DC-SSDEC) method utilizes two sets of pairwise soft constraints; "should-link" and "shouldNot-link", to guide the clustering process. The intended clustering task is expressed as an optimization of a newly designed objective function. Additionally, DC-SSDEC performance was evaluated through comprehensive experiments using three real-world and benchmark datasets. Moreover, a comparison with related state-of-the-art clustering techniques was conducted to showcase the DC-SSDEC outperformance. In particular, DC-SSDEC significance consists of the proposed dual-constraint formulation and its integration into a novel objective function. This contribution yielded an improvement in the resulting clustering performance compared to relevant state-of-the-art approaches. In addition, the assessment of the proposed model using real-world datasets represents another contribution of this research. In fact, increases of 3.25%, 1.44%, and 1.82% in the clustering accuracy were gained by DC-SSDEC over the best performing single-constraint-based approach, using MNIST, STL-10, and USPS datasets, respectively.
半监督聚类可以被视为一种聚类范式,它利用有标签和无标签的数据来引导学习准确的数据聚类,并避免局部最小解。然而,与当前完全无监督聚类的基准方法所取得的进展相比,改进现有半监督聚类方法的尝试相对有限。本研究引入了一种新的深度聚类半监督方法,该方法利用深度神经网络和模糊隶属度来更好地捕捉数据划分。具体而言,所提出的基于双约束的半监督深度聚类(DC-SSDEC)方法利用两组成对的软约束;“应该连接”和“不应该连接”,来指导聚类过程。预期的聚类任务被表示为一个新设计的目标函数的优化。此外,通过使用三个真实世界和基准数据集的综合实验对DC-SSDEC的性能进行了评估。此外,还与相关的最新聚类技术进行了比较,以展示DC-SSDEC的优越性。特别是,DC-SSDEC的重要性在于所提出的双约束公式及其集成到一个新的目标函数中。与相关的最新方法相比,这一贡献提高了聚类性能。此外,使用真实世界数据集对所提出模型进行评估是本研究的另一项贡献。事实上,使用MNIST、STL-10和USPS数据集,DC-SSDEC分别比性能最佳的基于单约束的方法在聚类准确率上提高了3.25%、1.44%和1.82%。