Suppr超能文献

分析和学习不同类型骚扰的语言。

Analyzing and learning the language for different types of harassment.

机构信息

University of Wisconsin-Madison, Madison, Wisconsin, United States of America.

University of Dayton, Dayton, Ohio, United States of America.

出版信息

PLoS One. 2020 Mar 27;15(3):e0227330. doi: 10.1371/journal.pone.0227330. eCollection 2020.

Abstract

THIS ARTICLE USES WORDS OR LANGUAGE THAT IS CONSIDERED PROFANE, VULGAR, OR OFFENSIVE BY SOME READERS. The presence of a significant amount of harassment in user-generated content and its negative impact calls for robust automatic detection approaches. This requires the identification of different types of harassment. Earlier work has classified harassing language in terms of hurtfulness, abusiveness, sentiment, and profanity. However, to identify and understand harassment more accurately, it is essential to determine the contextual type that captures the interrelated conditions in which harassing language occurs. In this paper we introduce the notion of contextual type in harassment by distinguishing between five contextual types: (i) sexual, (ii) racial, (iii) appearance-related, (iv) intellectual and (v) political. We utilize an annotated corpus from Twitter distinguishing these types of harassment. We study the context of each kind to shed light on the linguistic meaning, interpretation, and distribution, with results from two lines of investigation: an extensive linguistic analysis, and the statistical distribution of uni-grams. We then build type- aware classifiers to automate the identification of type-specific harassment. Our experiments demonstrate that these classifiers provide competitive accuracy for identifying and analyzing harassment on social media. We present extensive discussion and significant observations about the effectiveness of type-aware classifiers using a detailed comparison setup, providing insight into the role of type-dependent features.

摘要

本文使用了一些被某些读者认为是亵渎、粗俗或冒犯性的词语或语言。用户生成内容中存在大量骚扰,且其产生了负面影响,这呼吁我们采取强有力的自动检测方法。这需要识别不同类型的骚扰。早期的工作已经根据伤害性、辱骂性、情感和亵渎性等方面对骚扰语言进行了分类。然而,为了更准确地识别和理解骚扰,确定上下文类型以捕捉骚扰语言发生的相关条件至关重要。在本文中,我们通过区分以下五种上下文类型来引入骚扰中的上下文类型的概念:(i)性,(ii)种族,(iii)外貌相关,(iv)智力和(v)政治。我们利用来自 Twitter 的带注释语料库来区分这些类型的骚扰。我们研究了每种骚扰的上下文,以阐明其语言意义、解释和分布,这得益于两条研究路线的结果:广泛的语言分析和一元词的统计分布。然后,我们构建了基于类型的分类器,以实现对特定类型骚扰的自动识别。我们的实验表明,这些分类器在识别和分析社交媒体上的骚扰方面提供了有竞争力的准确性。我们通过详细的比较设置展示了关于基于类型的分类器有效性的广泛讨论和重要观察,深入了解了依赖类型的特征的作用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3f63/7100939/557ed0a3bb6e/pone.0227330.g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验