Min Sitao, Asif Hafiz, Wang Xinyue, Vaidya Jaideep
Rutgers University, Newark, NJ, USA.
Hofstra University, Long Island, NY, USA.
IEEE Trans Knowl Data Eng. 2025 May;37(5):2266-2281. doi: 10.1109/TKDE.2025.3537403. Epub 2025 Jan 30.
Federated learning (FL), a decentralized machine learning approach, offers great performance while alleviating autonomy and confidentiality concerns. Despite FL's popularity, how to deal with missing values in a federated manner is not well understood. In this work, we initiate a study of federated imputation of missing values, particularly in complex scenarios, where missing data heterogeneity exists and the state-of-the-art (SOTA) approaches for federated imputation suffer from significant loss in imputation quality. We propose Cafe, a personalized FL approach for missing data imputation. Cafe is inspired from the observation that heterogeneity can induce differences in observable and missing data distribution across clients, and that these differences can be leveraged to improve the imputation quality. Cafe computes personalized weights that are automatically calibrated for the level of heterogeneity, which can remain unknown, to develop personalized imputation models for each client. An extensive empirical evaluation over a variety of settings demonstrates that Cafe matches the performance of SOTA baselines in homogeneous settings while significantly outperforming the baselines in heterogeneous settings.
联邦学习(FL)是一种去中心化的机器学习方法,在缓解自主性和保密性问题的同时具有出色的性能。尽管联邦学习很受欢迎,但如何以联邦方式处理缺失值却尚未得到充分理解。在这项工作中,我们开启了对缺失值的联邦插补研究,特别是在存在缺失数据异质性且联邦插补的现有最佳(SOTA)方法在插补质量上存在显著损失的复杂场景中。我们提出了Cafe,一种用于缺失数据插补的个性化联邦学习方法。Cafe的灵感来自于这样的观察结果:异质性会导致不同客户端之间可观测数据和缺失数据分布的差异,并且这些差异可用于提高插补质量。Cafe会计算针对异质性水平自动校准的个性化权重(异质性水平可能未知),以便为每个客户端开发个性化插补模型。在各种设置下进行的广泛实证评估表明,Cafe在同质性设置中与SOTA基线的性能相当,而在异质性设置中则明显优于基线。