• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于 BERT 模型的社交媒体中的仇恨言论检测和种族偏见缓解。

Hate speech detection and racial bias mitigation in social media based on BERT model.

机构信息

CNRS UMR5157, Télécom SudParis, Institut Polytechnique de Paris, Évry, France.

出版信息

PLoS One. 2020 Aug 27;15(8):e0237861. doi: 10.1371/journal.pone.0237861. eCollection 2020.

DOI:10.1371/journal.pone.0237861
PMID:32853205
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7451563/
Abstract

Disparate biases associated with datasets and trained classifiers in hateful and abusive content identification tasks have raised many concerns recently. Although the problem of biased datasets on abusive language detection has been addressed more frequently, biases arising from trained classifiers have not yet been a matter of concern. In this paper, we first introduce a transfer learning approach for hate speech detection based on an existing pre-trained language model called BERT (Bidirectional Encoder Representations from Transformers) and evaluate the proposed model on two publicly available datasets that have been annotated for racism, sexism, hate or offensive content on Twitter. Next, we introduce a bias alleviation mechanism to mitigate the effect of bias in training set during the fine-tuning of our pre-trained BERT-based model for hate speech detection. Toward that end, we use an existing regularization method to reweight input samples, thereby decreasing the effects of high correlated training set' s n-grams with class labels, and then fine-tune our pre-trained BERT-based model with the new re-weighted samples. To evaluate our bias alleviation mechanism, we employed a cross-domain approach in which we use the trained classifiers on the aforementioned datasets to predict the labels of two new datasets from Twitter, AAE-aligned and White-aligned groups, which indicate tweets written in African-American English (AAE) and Standard American English (SAE), respectively. The results show the existence of systematic racial bias in trained classifiers, as they tend to assign tweets written in AAE from AAE-aligned group to negative classes such as racism, sexism, hate, and offensive more often than tweets written in SAE from White-aligned group. However, the racial bias in our classifiers reduces significantly after our bias alleviation mechanism is incorporated. This work could institute the first step towards debiasing hate speech and abusive language detection systems.

摘要

最近,与数据集和训练分类器相关的差异偏见在仇恨和辱骂内容识别任务中引起了很多关注。虽然已经更频繁地解决了关于辱骂性语言检测的有偏差数据集的问题,但源于训练分类器的偏差尚未成为关注的问题。在本文中,我们首先介绍了一种基于现有的预训练语言模型 BERT(来自 Transformer 的双向编码器表示)的仇恨言论检测的迁移学习方法,并在两个已为 Twitter 上的种族主义、性别歧视、仇恨或冒犯性内容进行标注的公开可用数据集上评估了所提出的模型。接下来,我们引入了一种偏差缓解机制,以减轻在对我们基于预训练 BERT 的仇恨言论检测模型进行微调过程中训练集的偏差影响。为此,我们使用现有的正则化方法对输入样本进行重新加权,从而减少了与类别标签相关的高相关训练集 n-gram 的影响,然后使用新的重新加权样本对我们的基于预训练 BERT 的模型进行微调。为了评估我们的偏差缓解机制,我们采用了一种跨领域的方法,即在上述数据集上使用训练好的分类器来预测来自 Twitter 的两个新数据集的标签,分别为 AAE-aligned 和 White-aligned 组,它们分别表示以非裔美国人英语(AAE)和标准美国英语(SAE)编写的推文。结果表明,训练好的分类器中存在系统性的种族偏见,因为它们倾向于将 AAE-aligned 组中以 AAE 编写的推文更频繁地分配到负面类别,如种族主义、性别歧视、仇恨和冒犯,而不是将 White-aligned 组中以 SAE 编写的推文分配到这些类别。然而,在我们的偏差缓解机制被纳入之后,分类器中的种族偏见显著减少。这项工作可能是朝着消除仇恨言论和辱骂性语言检测系统的偏见迈出的第一步。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cfb1/7451563/e3f895387853/pone.0237861.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cfb1/7451563/e3f895387853/pone.0237861.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cfb1/7451563/e3f895387853/pone.0237861.g001.jpg

相似文献

1
Hate speech detection and racial bias mitigation in social media based on BERT model.基于 BERT 模型的社交媒体中的仇恨言论检测和种族偏见缓解。
PLoS One. 2020 Aug 27;15(8):e0237861. doi: 10.1371/journal.pone.0237861. eCollection 2020.
2
Detection of Hate Speech in COVID-19-Related Tweets in the Arab Region: Deep Learning and Topic Modeling Approach.检测阿拉伯地区与 COVID-19 相关推文的仇恨言论:深度学习和主题建模方法。
J Med Internet Res. 2020 Dec 8;22(12):e22609. doi: 10.2196/22609.
3
Roman Urdu Hate Speech Detection Using Transformer-Based Model for Cyber Security Applications.基于转换器模型的罗曼 Urdu 仇恨言论检测在网络安全应用中的研究
Sensors (Basel). 2023 Apr 12;23(8):3909. doi: 10.3390/s23083909.
4
Addressing religious hate online: from taxonomy creation to automated detection.应对网络宗教仇恨:从分类法创建到自动检测
PeerJ Comput Sci. 2022 Dec 15;8:e1128. doi: 10.7717/peerj-cs.1128. eCollection 2022.
5
Emotionally Informed Hate Speech Detection: A Multi-target Perspective.基于情感信息的仇恨言论检测:多目标视角
Cognit Comput. 2022;14(1):322-352. doi: 10.1007/s12559-021-09862-5. Epub 2021 Jun 28.
6
Asian hate speech detection on Twitter during COVID-19.新冠疫情期间推特上的反亚裔仇恨言论检测
Front Artif Intell. 2022 Aug 15;5:932381. doi: 10.3389/frai.2022.932381. eCollection 2022.
7
Solidarity and strife after the Atlanta spa shootings: A mixed methods study characterizing Twitter discussions by qualitative analysis and machine learning.亚特兰大水疗中心枪击事件后的团结与冲突:通过定性分析和机器学习刻画 Twitter 讨论的混合方法研究。
Front Public Health. 2023 Feb 7;11:952069. doi: 10.3389/fpubh.2023.952069. eCollection 2023.
8
Retweet communities reveal the main sources of hate speech.转发社区揭示了仇恨言论的主要来源。
PLoS One. 2022 Mar 17;17(3):e0265602. doi: 10.1371/journal.pone.0265602. eCollection 2022.
9
Development of a COVID-19-Related Anti-Asian Tweet Data Set: Quantitative Study.与新冠疫情相关的反亚裔推文数据集的开发:定量研究。
JMIR Form Res. 2023 Feb 28;7:e40403. doi: 10.2196/40403.
10
Fighting hate speech from bilingual hinglish speaker's perspective, a transformer- and translation-based approach.从双语印式英语使用者的视角对抗仇恨言论:一种基于Transformer和翻译的方法
Soc Netw Anal Min. 2022;12(1):87. doi: 10.1007/s13278-022-00920-w. Epub 2022 Jul 24.

引用本文的文献

1
Examining Racial Discrimination Index and Black-Years of Potential Life Lost (YPLL) in South Carolina: A Real-Time Social Media Research.审视南卡罗来纳州的种族歧视指数与黑人潜在寿命损失年数(YPLL):一项实时社交媒体研究
J Racial Ethn Health Disparities. 2025 Apr 8. doi: 10.1007/s40615-025-02416-7.
2
Role of Synchronous, Moderated, and Anonymous Peer Support Chats on Reducing Momentary Loneliness in Older Adults: Retrospective Observational Study.同步、适度和匿名同伴支持聊天在减少老年人瞬间孤独感中的作用:回顾性观察研究。
JMIR Form Res. 2024 Oct 25;8:e59501. doi: 10.2196/59501.
3
Cross-lingual hate speech detection using domain-specific word embeddings.

本文引用的文献

1
Hate speech detection: Challenges and solutions.仇恨言论检测:挑战与解决方案。
PLoS One. 2019 Aug 20;14(8):e0221152. doi: 10.1371/journal.pone.0221152. eCollection 2019.
跨语言仇恨言论检测使用领域特定的词嵌入。
PLoS One. 2024 Jul 30;19(7):e0306521. doi: 10.1371/journal.pone.0306521. eCollection 2024.
4
Offensive language detection in low resource languages: A use case of Persian language.低资源语言中的攻击性语言检测:以波斯语为例。
PLoS One. 2024 Jun 21;19(6):e0304166. doi: 10.1371/journal.pone.0304166. eCollection 2024.
5
Multilingual Hate Speech Detection: A Semi-Supervised Generative Adversarial Approach.多语言仇恨言论检测:一种半监督生成对抗方法。
Entropy (Basel). 2024 Apr 18;26(4):344. doi: 10.3390/e26040344.
6
Roman Urdu Hate Speech Detection Using Transformer-Based Model for Cyber Security Applications.基于转换器模型的罗曼 Urdu 仇恨言论检测在网络安全应用中的研究
Sensors (Basel). 2023 Apr 12;23(8):3909. doi: 10.3390/s23083909.
7
Brain Structure and Function gets serious about ethical science writing.《脑结构与功能》认真对待符合伦理的科学写作。
Brain Struct Funct. 2023 May;228(3-4):699-701. doi: 10.1007/s00429-023-02645-8.
8
Mining of Movie Box Office and Movie Review Topics Using Social Network Big Data.利用社交网络大数据挖掘电影票房和电影评论主题
Front Psychol. 2022 May 26;13:903380. doi: 10.3389/fpsyg.2022.903380. eCollection 2022.
9
Weight Stigma and Social Media: Evidence and Public Health Solutions.体重歧视与社交媒体:证据及公共卫生解决方案
Front Nutr. 2021 Nov 12;8:739056. doi: 10.3389/fnut.2021.739056. eCollection 2021.
10
Deep Learning for Identification of Alcohol-Related Content on Social Media (Reddit and Twitter): Exploratory Analysis of Alcohol-Related Outcomes.社交媒体(Reddit 和 Twitter)上的酒精相关内容的深度学习识别:酒精相关结果的探索性分析。
J Med Internet Res. 2021 Sep 15;23(9):e27314. doi: 10.2196/27314.