Najari Shaghayegh, Salehi Mostafa, Farahbakhsh Reza
Faculty of New Sciences and Technologies, University of Tehran, Tehran, Iran.
School of Computer Science, Institute for Research in Fundamental Science (IPM), P.O.Box 19395-5746, Tehran, Iran.
Soc Netw Anal Min. 2022;12(1):4. doi: 10.1007/s13278-021-00800-9. Epub 2021 Nov 14.
Nowadays, a massive number of people are involved in various social media. This fact enables organizations and institutions to more easily access their audiences across the globe. Some of them use as an automatic entity to gain intangible access and influence on their users by faster content propagation. Thereby, malicious social bots are populating more and more to fool humans with their unrealistic behavior and content. Hence, that's necessary to distinguish these fake social accounts from real ones. Multiple approaches have been investigated in the literature to answer this problem. Statistical machine learning methods are one of them focusing on handcrafted features to represent characteristics of social bots. Although they reached successful results in some cases, they relied on the bot's behavior and failed in the behavioral change patterns of bots. On the other hands, more advanced deep neural network-based methods aim to overcome this limitation. as new technology from this domain is a semi-supervised method that demonstrates to extract the behavioral pattern of the data. In this work, we use GAN to leak more information of bot samples for state-of-the-art textual bot detection method (). Although GAN augments low labeled data, original textual GAN () has the known limitation of convergence. In this paper, we invested this limitation and customized the GAN idea in a new framework called , in which the generator and classifier connect by an LSTM layer as a shared channel between them. Our experimental results on a bench-marked dataset of Twitter social bot show our proposed framework outperforms the existing contextual LSTM method by increasing bot detection probabilities.
如今,大量的人参与到各种社交媒体中。这一事实使得组织和机构能够更轻松地接触到全球各地的受众。其中一些组织将社交媒体用作一种自动实体,通过更快地传播内容来无形地接触和影响其用户。因此,恶意社交机器人越来越多,它们以不切实际的行为和内容来欺骗人类。因此,有必要将这些虚假社交账户与真实账户区分开来。文献中已经研究了多种方法来解决这个问题。统计机器学习方法就是其中之一,它专注于用手工制作的特征来表示社交机器人的特征。尽管它们在某些情况下取得了成功,但它们依赖于机器人的行为,并且在机器人行为变化模式方面存在不足。另一方面,更先进的基于深度神经网络的方法旨在克服这一局限性。作为该领域的新技术,是一种半监督方法,它能够展示提取数据的行为模式。在这项工作中,我们使用生成对抗网络(GAN)为最先进的文本机器人检测方法泄露更多机器人样本的信息。尽管生成对抗网络增强了低标注数据,但原始文本生成对抗网络存在已知的收敛局限性。在本文中,我们研究了这一局限性,并在一个名为的新框架中定制了生成对抗网络的理念,在该框架中,生成器和分类器通过一个长短期记忆(LSTM)层连接起来,作为它们之间的共享通道。我们在一个经过基准测试的推特社交机器人数据集上的实验结果表明,我们提出的框架通过提高机器人检测概率,优于现有的上下文长短期记忆方法。