Suppr超能文献

在线视频分享系统中垃圾信息发送者和内容推广者的实际检测

Practical detection of spammers and content promoters in online video sharing systems.

作者信息

Benevenuto Fabrício, Rodrigues Tiago, Veloso Adriano, Almeida Jussara, Gonçalves Marcos, Almeida Virgílio

机构信息

Computer Science Department, Federal University of Ouro Preto, Ouro Preto, MG, Brazil.

出版信息

IEEE Trans Syst Man Cybern B Cybern. 2012 Jun;42(3):688-701. doi: 10.1109/TSMCB.2011.2173799. Epub 2011 Nov 30.

Abstract

A number of online video sharing systems, out of which YouTube is the most popular, provide features that allow users to post a video as a response to a discussion topic. These features open opportunities for users to introduce polluted content, or simply pollution, into the system. For instance, spammers may post an unrelated video as response to a popular one, aiming at increasing the likelihood of the response being viewed by a larger number of users. Moreover, content promoters may try to gain visibility to a specific video by posting a large number of (potentially unrelated) responses to boost the rank of the responded video, making it appear in the top lists maintained by the system. Content pollution may jeopardize the trust of users on the system, thus compromising its success in promoting social interactions. In spite of that, the available literature is very limited in providing a deep understanding of this problem. In this paper, we address the issue of detecting video spammers and promoters. Towards that end, we first manually build a test collection of real YouTube users, classifying them as spammers, promoters, and legitimate users. Using our test collection, we provide a characterization of content, individual, and social attributes that help distinguish each user class. We then investigate the feasibility of using supervised classification algorithms to automatically detect spammers and promoters, and assess their effectiveness in our test collection. While our classification approach succeeds at separating spammers and promoters from legitimate users, the high cost of manually labeling vast amounts of examples compromises its full potential in realistic scenarios. For this reason, we further propose an active learning approach that automatically chooses a set of examples to label, which is likely to provide the highest amount of information, drastically reducing the amount of required training data while maintaining comparable classification effectiveness.

摘要

许多在线视频分享系统(其中YouTube最为流行)都提供了一些功能,允许用户发布视频作为对某个讨论主题的回应。这些功能为用户将污染性内容(或简称为污染)引入系统创造了机会。例如,垃圾信息发送者可能会发布一个与热门视频无关的视频作为回应,目的是增加该回应被更多用户观看的可能性。此外,内容推广者可能会通过发布大量(可能无关的)回应来提高某个特定视频的曝光度,以提升被回应视频的排名,使其出现在系统维护的置顶列表中。内容污染可能会损害用户对该系统的信任,进而危及系统在促进社交互动方面的成功。尽管如此,现有文献在深入理解这一问题方面非常有限。在本文中,我们着手解决检测视频垃圾信息发送者和推广者的问题。为此,我们首先手动构建了一个真实YouTube用户的测试集,将他们分类为垃圾信息发送者、推广者和合法用户。利用我们的测试集,我们对有助于区分每个用户类别的内容、个体和社交属性进行了特征描述。然后,我们研究了使用监督分类算法自动检测垃圾信息发送者和推广者的可行性,并在我们的测试集中评估了它们的有效性。虽然我们的分类方法成功地将垃圾信息发送者和推广者与合法用户区分开来,但手动标注大量示例的高成本限制了其在实际场景中的全部潜力。出于这个原因,我们进一步提出了一种主动学习方法,该方法会自动选择一组示例进行标注,这些示例可能会提供最多的信息,在保持可比分类效果的同时大幅减少所需的训练数据量。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验