Wang Xiujuan, Chen Kangmiao, Wang Keke, Wang Zhengxiang, Zheng Kangfeng, Zhang Jiayue
Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China.
School of Cyberspace Security, Beijing University of Posts and Telecommunications, Beijing 100876, China.
Sensors (Basel). 2024 May 28;24(11):3481. doi: 10.3390/s24113481.
Malicious social bots pose a serious threat to social network security by spreading false information and guiding bad opinions in social networks. The singularity and scarcity of single organization data and the high cost of labeling social bots have given rise to the construction of federated models that combine federated learning with social bot detection. In this paper, we first combine the federated learning framework with the Relational Graph Convolutional Neural Network (RGCN) model to achieve federated social bot detection. A class-level cross entropy loss function is applied in the local model training to mitigate the effects of the class imbalance problem in local data. To address the data heterogeneity issue from multiple participants, we optimize the classical federated learning algorithm by applying knowledge distillation methods. Specifically, we adjust the client-side and server-side models separately: training a global generator to generate pseudo-samples based on the local data distribution knowledge to correct the optimization direction of client-side classification models, and integrating client-side classification models' knowledge on the server side to guide the training of the global classification model. We conduct extensive experiments on widely used datasets, and the results demonstrate the effectiveness of our approach in social bot detection in heterogeneous data scenarios. Compared to baseline methods, our approach achieves a nearly 3-10% improvement in detection accuracy when the data heterogeneity is larger. Additionally, our method achieves the specified accuracy with minimal communication rounds.
恶意社交机器人通过在社交网络中传播虚假信息和引导不良观点,对社交网络安全构成严重威胁。单个组织数据的独特性和稀缺性以及标记社交机器人的高成本,催生了将联邦学习与社交机器人检测相结合的联邦模型的构建。在本文中,我们首先将联邦学习框架与关系图卷积神经网络(RGCN)模型相结合,以实现联邦社交机器人检测。在局部模型训练中应用类级交叉熵损失函数,以减轻局部数据中类不平衡问题的影响。为了解决来自多个参与者的数据异质性问题,我们通过应用知识蒸馏方法优化经典联邦学习算法。具体来说,我们分别调整客户端和服务器端模型:训练一个全局生成器,根据局部数据分布知识生成伪样本,以校正客户端分类模型的优化方向,并在服务器端整合客户端分类模型的知识,以指导全局分类模型的训练。我们在广泛使用的数据集上进行了大量实验,结果证明了我们的方法在异构数据场景中进行社交机器人检测的有效性。与基线方法相比,当数据异质性较大时,我们的方法在检测准确率上提高了近3%-10%。此外,我们的方法以最少的通信轮次达到了指定的准确率。