• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

辱骂性语言数据集的偏差与比较框架

Bias and comparison framework for abusive language datasets.

作者信息

Wich Maximilian, Eder Tobias, Al Kuwatly Hala, Groh Georg

机构信息

Technical University of Munich, Munich, Germany.

出版信息

AI Ethics. 2022;2(1):79-101. doi: 10.1007/s43681-021-00081-0. Epub 2021 Jul 19.

DOI:10.1007/s43681-021-00081-0
PMID:34790954
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8288848/
Abstract

Recently, numerous datasets have been produced as research activities in the field of automatic detection of abusive language or hate speech have increased. A problem with this diversity is that they often differ, among other things, in context, platform, sampling process, collection strategy, and labeling schema. There have been surveys on these datasets, but they compare the datasets only superficially. Therefore, we developed a bias and comparison framework for abusive language datasets for their in-depth analysis and to provide a comparison of five English and six Arabic datasets. We make this framework available to researchers and data scientists who work with such datasets to be aware of the properties of the datasets and consider them in their work.

摘要

最近,随着辱骂性语言或仇恨言论自动检测领域的研究活动增加,大量数据集被生成。这种多样性带来的一个问题是,它们往往在上下文、平台、采样过程、收集策略和标注模式等方面存在差异。已经有针对这些数据集的调查,但它们只是对数据集进行了表面的比较。因此,我们开发了一个针对辱骂性语言数据集的偏差与比较框架,用于深入分析这些数据集,并对五个英语数据集和六个阿拉伯语数据集进行比较。我们将这个框架提供给处理此类数据集的研究人员和数据科学家,以使他们了解数据集的属性,并在工作中加以考虑。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/da43/8288848/e80b0756baee/43681_2021_81_Fig18_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/da43/8288848/c06292c72da1/43681_2021_81_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/da43/8288848/d8076dc77573/43681_2021_81_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/da43/8288848/55d280000a97/43681_2021_81_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/da43/8288848/406f87e727a9/43681_2021_81_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/da43/8288848/3b457874d248/43681_2021_81_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/da43/8288848/c7ff82d6a5af/43681_2021_81_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/da43/8288848/2412a7a3436f/43681_2021_81_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/da43/8288848/6eea03e82042/43681_2021_81_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/da43/8288848/fbc81f7124c7/43681_2021_81_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/da43/8288848/1b453916fb00/43681_2021_81_Fig10_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/da43/8288848/62fc15d5b7a0/43681_2021_81_Fig11_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/da43/8288848/f0b70a9134ac/43681_2021_81_Fig12_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/da43/8288848/0bd5ec836c33/43681_2021_81_Fig13_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/da43/8288848/47161e42e683/43681_2021_81_Fig14_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/da43/8288848/05026b9b2ee6/43681_2021_81_Fig15_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/da43/8288848/abeea8bad6a9/43681_2021_81_Fig16_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/da43/8288848/8524e258268f/43681_2021_81_Fig17_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/da43/8288848/e80b0756baee/43681_2021_81_Fig18_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/da43/8288848/c06292c72da1/43681_2021_81_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/da43/8288848/d8076dc77573/43681_2021_81_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/da43/8288848/55d280000a97/43681_2021_81_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/da43/8288848/406f87e727a9/43681_2021_81_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/da43/8288848/3b457874d248/43681_2021_81_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/da43/8288848/c7ff82d6a5af/43681_2021_81_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/da43/8288848/2412a7a3436f/43681_2021_81_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/da43/8288848/6eea03e82042/43681_2021_81_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/da43/8288848/fbc81f7124c7/43681_2021_81_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/da43/8288848/1b453916fb00/43681_2021_81_Fig10_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/da43/8288848/62fc15d5b7a0/43681_2021_81_Fig11_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/da43/8288848/f0b70a9134ac/43681_2021_81_Fig12_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/da43/8288848/0bd5ec836c33/43681_2021_81_Fig13_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/da43/8288848/47161e42e683/43681_2021_81_Fig14_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/da43/8288848/05026b9b2ee6/43681_2021_81_Fig15_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/da43/8288848/abeea8bad6a9/43681_2021_81_Fig16_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/da43/8288848/8524e258268f/43681_2021_81_Fig17_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/da43/8288848/e80b0756baee/43681_2021_81_Fig18_HTML.jpg

相似文献

1
Bias and comparison framework for abusive language datasets.辱骂性语言数据集的偏差与比较框架
AI Ethics. 2022;2(1):79-101. doi: 10.1007/s43681-021-00081-0. Epub 2021 Jul 19.
2
Hate speech detection and racial bias mitigation in social media based on BERT model.基于 BERT 模型的社交媒体中的仇恨言论检测和种族偏见缓解。
PLoS One. 2020 Aug 27;15(8):e0237861. doi: 10.1371/journal.pone.0237861. eCollection 2020.
3
Addressing religious hate online: from taxonomy creation to automated detection.应对网络宗教仇恨:从分类法创建到自动检测
PeerJ Comput Sci. 2022 Dec 15;8:e1128. doi: 10.7717/peerj-cs.1128. eCollection 2022.
4
A review on abusive content automatic detection: approaches, challenges and opportunities.关于辱骂性内容自动检测的综述:方法、挑战与机遇
PeerJ Comput Sci. 2022 Nov 9;8:e1142. doi: 10.7717/peerj-cs.1142. eCollection 2022.
5
Directions in abusive language training data, a systematic review: Garbage in, garbage out.在辱骂性语言训练数据的方向上,一项系统评价:垃圾进,垃圾出。
PLoS One. 2020 Dec 28;15(12):e0243300. doi: 10.1371/journal.pone.0243300. eCollection 2020.
6
Detection of Hate Speech in COVID-19-Related Tweets in the Arab Region: Deep Learning and Topic Modeling Approach.检测阿拉伯地区与 COVID-19 相关推文的仇恨言论:深度学习和主题建模方法。
J Med Internet Res. 2020 Dec 8;22(12):e22609. doi: 10.2196/22609.
7
Hate speech detection with ADHAR: a multi-dialectal hate speech corpus in Arabic.使用ADHAR进行仇恨言论检测:一个阿拉伯语多方言仇恨言论语料库。
Front Artif Intell. 2024 May 30;7:1391472. doi: 10.3389/frai.2024.1391472. eCollection 2024.
8
Code-mixing unveiled: Enhancing the hate speech detection in Arabic dialect tweets using machine learning models.代码混合揭秘:使用机器学习模型增强阿拉伯方言推文中的仇恨言论检测
PLoS One. 2024 Jul 17;19(7):e0305657. doi: 10.1371/journal.pone.0305657. eCollection 2024.
9
A systematic literature review of hate speech identification on Arabic Twitter data: research challenges and future directions.关于阿拉伯语推特数据中仇恨言论识别的系统文献综述:研究挑战与未来方向。
PeerJ Comput Sci. 2024 Apr 2;10:e1966. doi: 10.7717/peerj-cs.1966. eCollection 2024.
10
Hate speech and abusive language detection in Indonesian social media: Progress and challenges.印度尼西亚社交媒体中的仇恨言论和辱骂性语言检测:进展与挑战。
Heliyon. 2023 Jul 28;9(8):e18647. doi: 10.1016/j.heliyon.2023.e18647. eCollection 2023 Aug.

引用本文的文献

1
Preventing profiling for ethical fake news detection.防止在道德假新闻检测中进行特征分析。
Inf Process Manag. 2023 Mar;60(2):None. doi: 10.1016/j.ipm.2022.103206.

本文引用的文献

1
Directions in abusive language training data, a systematic review: Garbage in, garbage out.在辱骂性语言训练数据的方向上,一项系统评价:垃圾进,垃圾出。
PLoS One. 2020 Dec 28;15(12):e0243300. doi: 10.1371/journal.pone.0243300. eCollection 2020.
2
Hate speech detection: Challenges and solutions.仇恨言论检测:挑战与解决方案。
PLoS One. 2019 Aug 20;14(8):e0221152. doi: 10.1371/journal.pone.0221152. eCollection 2019.