IEEE Trans Pattern Anal Mach Intell. 2024 Aug;46(8):5345-5361. doi: 10.1109/TPAMI.2024.3367412. Epub 2024 Jul 2.
Federated human activity recognition (FHAR) has attracted much attention due to its great potential in privacy protection. Existing FHAR methods can collaboratively learn a global activity recognition model based on unimodal or multimodal data distributed on different local clients. However, it is still questionable whether existing methods can work well in a more common scenario where local data are from different modalities, e.g., some local clients may provide motion signals while others can only provide visual data. In this article, we study a new problem of cross-modal federated human activity recognition (CM-FHAR), which is conducive to promote the large-scale use of the HAR model on more local devices. CM-FHAR has at least three dedicated challenges: 1) distributive common cross-modal feature learning, 2) modality-dependent discriminate feature learning, 3) modality imbalance issue. To address these challenges, we propose a modality-collaborative activity recognition network (MCARN), which can comprehensively learn a global activity classifier shared across all clients and multiple modality-dependent private activity classifiers. To produce modality-agnostic and modality-specific features, we learn an altruistic encoder and an egocentric encoder under the constraint of a separation loss and an adversarial modality discriminator collaboratively learned in hyper-sphere. To address the modality imbalance issue, we propose an angular margin adjustment scheme to improve the modality discriminator on modality-imbalanced data by enhancing the intra-modality compactness of the dominant modality and increase the inter-modality discrepancy. Moreover, we propose a relation-aware global-local calibration mechanism to constrain class-level pairwise relationships for the parameters of the private classifier. Finally, through decentralized optimization with alternative steps of adversarial local updating and modality-aware global aggregation, the proposed MCARN obtains state-of-the-art performance on both modality-balanced and modality-imbalanced data.
联邦人体活动识别(FHAR)由于在隐私保护方面的巨大潜力而受到广泛关注。现有的 FHAR 方法可以基于分布在不同本地客户端的单模态或多模态数据协作学习全局活动识别模型。然而,现有的方法在更常见的情况下是否能很好地工作仍然值得怀疑,在这种情况下,本地数据来自不同的模态,例如,一些本地客户端可能提供运动信号,而其他客户端只能提供视觉数据。在本文中,我们研究了一个新的问题,即跨模态联邦人体活动识别(CM-FHAR),这有助于促进 HAR 模型在更多本地设备上的大规模使用。CM-FHAR 至少有三个专门的挑战:1)分布式通用跨模态特征学习,2)模态相关判别特征学习,3)模态不平衡问题。为了解决这些挑战,我们提出了一种模态协作活动识别网络(MCARN),它可以综合学习所有客户端和多个模态相关私有活动分类器共享的全局活动分类器。为了生成与模态无关和与模态相关的特征,我们在分离损失和协作学习的对抗模态判别器的约束下,学习利他编码器和自我中心编码器。为了解决模态不平衡问题,我们提出了一种角度边缘调整方案,通过增强主导模态的内模态紧凑性并增加模态间差异,来提高模态判别器在模态不平衡数据上的性能。此外,我们提出了一种关系感知的全局-局部校准机制,以约束私有分类器参数的类级成对关系。最后,通过交替进行对抗性局部更新和模态感知全局聚合的分散式优化,所提出的 MCARN 在模态平衡和模态不平衡数据上都取得了最先进的性能。