Liu Hongquan, Mi Yuxi, Tang Yateng, Guan Jihong, Zhou Shuigeng
Shanghai Key Lab of Intelligent Information Processing, and School of Computer Science, Fudan University, Shanghai, China.
Wechat, Tencent Inc., Shenzhen, China.
Neural Netw. 2025 Aug;188:107440. doi: 10.1016/j.neunet.2025.107440. Epub 2025 Apr 4.
Semi-supervised federated learning (SSFL) has emerged as a promising paradigm to reduce the need for fully labeled data in training federated learning (FL) models. This paper focuses on the label-at-server scenario, where clients' data are entirely unlabeled and the server possesses only a limited amount of labeled data. In this setting, the non-independent and identically distributed (non-IID) local data and the incorrect pseudo-labels will possibly introduce bias into the model during local training. Prior works try to alleviate the bias by fine-tuning the global model with clean labeled data, ignoring explicitly leveraging server-side knowledge to guide local training. Additionally, existing methods typically discard samples with unconfident pseudo-labels, resulting in many samples being not used, consequently suboptimal performance and slow convergence. This paper introduces a novel method to enhance SSFL performance by effectively exploiting server-side clean knowledge and client-side unconfident samples. Specifically, we propose a representation alignment module that mitigates the influence of non-IID data by aligning local features with the class proxies of the server labeled data. Furthermore, we employ a shrink loss to reduce the risk associated with unreliable pseudo-labels, ensuring the exploitation of valuable information contained in the entire unlabeled dataset. Extensive experiments on five benchmark datasets under various settings demonstrate the effectiveness and generality of the proposed method, which not only outperforms existing methods but also reduces the communication cost required to achieve the target performance.
半监督联邦学习(SSFL)已成为一种很有前景的范式,可减少训练联邦学习(FL)模型时对完全标记数据的需求。本文聚焦于服务器端标记场景,即客户端数据完全未标记,而服务器仅拥有少量标记数据。在这种设置下,非独立同分布(non-IID)的本地数据和错误的伪标签可能会在本地训练期间给模型引入偏差。先前的工作试图通过使用干净的标记数据微调全局模型来减轻偏差,却忽略了明确利用服务器端知识来指导本地训练。此外,现有方法通常会丢弃具有不可信伪标签的样本,导致许多样本未被使用,从而性能次优且收敛缓慢。本文介绍了一种通过有效利用服务器端干净知识和客户端不可信样本提高SSFL性能的新方法。具体而言,我们提出了一个表示对齐模块,通过将本地特征与服务器标记数据的类代理对齐来减轻非IID数据的影响。此外,我们采用收缩损失来降低与不可靠伪标签相关的风险,确保利用整个未标记数据集中包含的有价值信息。在各种设置下对五个基准数据集进行的大量实验证明了所提方法的有效性和通用性,该方法不仅优于现有方法,还降低了实现目标性能所需的通信成本。