IEEE Trans Cybern. 2020 Apr;50(4):1595-1606. doi: 10.1109/TCYB.2018.2877161. Epub 2018 Nov 2.
Spammers, who manipulate online reviews to promote or suppress products, are flooding in online commerce. To combat this trend, there has been a great deal of research focused on detecting review spammers, most of which design diversified features and thus develop various classifiers. The widespread growth of crowdsourcing platforms has created large-scale deceptive review writers who behave more like normal users, that the way they can more easily evade detection by the classifiers that are purely based on fixed characteristics. In this paper, we propose a hybrid semisupervised learning model titled hybrid PU-learning-based spammer detection (hPSD) for spammer detection to leverage both the users' characteristics and the user-product relations. Specifically, the hPSD model can iteratively detect multitype spammers by injecting different positive samples, and allows the construction of classifiers in a semisupervised hybrid learning framework. Comprehensive experiments on movie dataset with shilling injection confirm the superior performance of hPSD over existing baseline methods. The hPSD is then utilized to detect the hidden spammers from real-life Amazon data. A set of spammers and their underlying employers (e.g., book publishers) are successfully discovered and validated. These demonstrate that hPSD meets the real-world application scenarios and can thus effectively detect the potentially deceptive review writers.
垃圾评论制造者操纵在线评论来推销或打压产品,从而充斥于网络商务中。为了应对这一趋势,有大量研究致力于检测评论垃圾制造者,其中大多数设计了多样化的特征,从而开发了各种分类器。众包平台的广泛发展催生了大量具有欺骗性的评论撰写者,他们的行为更像正常用户,这使得他们更容易逃避纯粹基于固定特征的分类器的检测。在本文中,我们提出了一种混合半监督学习模型,称为基于混合伪标记学习的垃圾评论制造者检测(hPSD),用于垃圾评论制造者检测,以利用用户特征和用户-产品关系。具体来说,hPSD 模型可以通过注入不同的正样本来迭代地检测多种类型的垃圾评论制造者,并允许在半监督混合学习框架中构建分类器。在带有刷票注入的电影数据集上的综合实验证实了 hPSD 优于现有基线方法的性能。然后,我们利用 hPSD 从真实的亚马逊数据中检测隐藏的垃圾评论制造者。成功发现并验证了一组垃圾评论制造者及其潜在雇主(例如,图书出版商)。这些结果表明,hPSD 满足实际应用场景,因此可以有效地检测潜在的欺骗性评论撰写者。