• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

新冠疫情期间推特上的反亚裔仇恨言论检测

Asian hate speech detection on Twitter during COVID-19.

作者信息

Toliyat Amir, Levitan Sarah Ita, Peng Zheng, Etemadpour Ronak

机构信息

Computer Science Program, Graduate Center, City University of New York, New York, NY, United States.

Computer Science Program, Hunter College, City University of New York, New York, NY, United States.

出版信息

Front Artif Intell. 2022 Aug 15;5:932381. doi: 10.3389/frai.2022.932381. eCollection 2022.

DOI:10.3389/frai.2022.932381
PMID:36046150
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9421075/
Abstract

Coronavirus disease 2019 (COVID-19) started in Wuhan, China, in late 2019, and after being utterly contagious in Asian countries, it rapidly spread to other countries. This disease caused governments worldwide to declare a public health crisis with severe measures taken to reduce the speed of the spread of the disease. This pandemic affected the lives of millions of people. Many citizens that lost their loved ones and jobs experienced a wide range of emotions, such as disbelief, shock, concerns about health, fear about food supplies, anxiety, and panic. All of the aforementioned phenomena led to the spread of racism and hate against Asians in western countries, especially in the United States. An analysis of official preliminary police data by the Center for the Study of Hate & Extremism at California State University shows that Anti-Asian hate crime in 16 of America's largest cities increased by 149% in 2020. In this study, we first chose a baseline of Americans' hate crimes against Asians on Twitter. Then we present an approach to balance the biased dataset and consequently improve the performance of tweet classification. We also have downloaded 10 million tweets through the Twitter API V-2. In this study, we have used a small portion of that, and we will use the entire dataset in the future study. In this article, three thousand tweets from our collected corpus are annotated by four annotators, including three Asian and one Asian-American. Using this data, we built predictive models of hate speech using various machine learning and deep learning methods. Our machine learning methods include Random Forest, K-nearest neighbors (KNN), Support Vector Machine (SVM), Extreme Gradient Boosting (XGBoost), Logistic Regression, Decision Tree, and Naive Bayes. Our Deep Learning models include Basic Long-Term Short-Term Memory (LSTM), Bidirectional LSTM, Bidirectional LSTM with Drop out, Convolution, and Bidirectional Encoder Representations from Transformers (BERT). We also adjusted our dataset by filtering tweets that were ambiguous to the annotators based on low Fleiss Kappa agreement between annotators. Our final result showed that Logistic Regression achieved the best statistical machine learning performance with an F1 score of 0.72, while BERT achieved the best performance of the deep learning models, with an F1-Score of 0.85.

摘要

2019年冠状病毒病(COVID-19)于2019年末在中国武汉爆发,在亚洲国家迅速传播后,又迅速蔓延至其他国家。这场疾病促使世界各国政府宣布进入公共卫生危机状态,并采取严厉措施以减缓疾病传播速度。这场大流行影响了数百万人的生活。许多失去亲人和工作的公民经历了各种各样的情绪,如怀疑、震惊、对健康的担忧、对食品供应的恐惧、焦虑和恐慌。所有上述现象导致了西方国家,尤其是美国针对亚洲人的种族主义和仇恨情绪的蔓延。加利福尼亚州立大学仇恨与极端主义研究中心对官方初步警方数据的分析显示,2020年美国16个最大城市的反亚裔仇恨犯罪增加了149%。在本研究中,我们首先选取了美国人在推特上针对亚洲人的仇恨犯罪基线。然后我们提出一种方法来平衡有偏差的数据集,从而提高推文分类的性能。我们还通过推特API V-2下载了1000万条推文。在本研究中,我们使用了其中的一小部分,未来的研究中将使用整个数据集。在本文中,我们收集的语料库中的三千条推文由四名注释者进行注释,其中包括三名亚洲人和一名亚裔美国人。利用这些数据,我们使用各种机器学习和深度学习方法构建了仇恨言论预测模型。我们的机器学习方法包括随机森林、K近邻(KNN)、支持向量机(SVM)、极端梯度提升(XGBoost)、逻辑回归、决策树和朴素贝叶斯。我们的深度学习模型包括基本长短期记忆(LSTM)、双向LSTM、带随机失活的双向LSTM、卷积以及基于变换器的双向编码器表征(BERT)。我们还根据注释者之间较低的弗赖斯kappa一致性,过滤掉注释者认为模糊的推文,对数据集进行了调整。我们的最终结果表明,逻辑回归的统计机器学习性能最佳,F1分数为0.72,而BERT在深度学习模型中性能最佳,F1分数为0.85。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6a3d/9421075/482fc971c49c/frai-05-932381-g0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6a3d/9421075/ba691f5453ec/frai-05-932381-g0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6a3d/9421075/f9c748d65e20/frai-05-932381-g0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6a3d/9421075/482fc971c49c/frai-05-932381-g0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6a3d/9421075/ba691f5453ec/frai-05-932381-g0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6a3d/9421075/f9c748d65e20/frai-05-932381-g0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6a3d/9421075/482fc971c49c/frai-05-932381-g0003.jpg

相似文献

1
Asian hate speech detection on Twitter during COVID-19.新冠疫情期间推特上的反亚裔仇恨言论检测
Front Artif Intell. 2022 Aug 15;5:932381. doi: 10.3389/frai.2022.932381. eCollection 2022.
2
Development of a COVID-19-Related Anti-Asian Tweet Data Set: Quantitative Study.与新冠疫情相关的反亚裔推文数据集的开发:定量研究。
JMIR Form Res. 2023 Feb 28;7:e40403. doi: 10.2196/40403.
3
Detection of Hate Speech in COVID-19-Related Tweets in the Arab Region: Deep Learning and Topic Modeling Approach.检测阿拉伯地区与 COVID-19 相关推文的仇恨言论:深度学习和主题建模方法。
J Med Internet Res. 2020 Dec 8;22(12):e22609. doi: 10.2196/22609.
4
From Tweets to Streets: Observational Study on the Association Between Twitter Sentiment and Anti-Asian Hate Crimes in New York City from 2019 to 2022.从推文到街头:2019 年至 2022 年纽约市推特情绪与反亚裔仇恨犯罪的观察性研究。
J Med Internet Res. 2024 Sep 9;26:e53050. doi: 10.2196/53050.
5
Hate speech detection and racial bias mitigation in social media based on BERT model.基于 BERT 模型的社交媒体中的仇恨言论检测和种族偏见缓解。
PLoS One. 2020 Aug 27;15(8):e0237861. doi: 10.1371/journal.pone.0237861. eCollection 2020.
6
Developing an Automatic System for Classifying Chatter About Health Services on Twitter: Case Study for Medicaid.开发一个自动系统来对 Twitter 上有关医疗服务的闲聊进行分类:以医疗补助计划为例。
J Med Internet Res. 2021 May 3;23(5):e26616. doi: 10.2196/26616.
7
Applying Machine Learning to Identify Anti-Vaccination Tweets during the COVID-19 Pandemic.应用机器学习识别 COVID-19 大流行期间的反疫苗推文。
Int J Environ Res Public Health. 2021 Apr 12;18(8):4069. doi: 10.3390/ijerph18084069.
8
Identifying Potential Lyme Disease Cases Using Self-Reported Worldwide Tweets: Deep Learning Modeling Approach Enhanced With Sentimental Words Through Emojis.利用自我报告的全球推文识别潜在莱姆病病例:通过表情符号增强带有情感词汇的深度学习模型。
J Med Internet Res. 2023 Oct 16;25:e47014. doi: 10.2196/47014.
9
Social Media Monitoring of the COVID-19 Pandemic and Influenza Epidemic With Adaptation for Informal Language in Arabic Twitter Data: Qualitative Study.对新冠疫情和流感流行进行社交媒体监测,并针对阿拉伯语推特数据中的非正式语言进行调整:定性研究。
JMIR Med Inform. 2021 Sep 17;9(9):e27670. doi: 10.2196/27670.
10
Toward Using Twitter for Tracking COVID-19: A Natural Language Processing Pipeline and Exploratory Data Set.用于追踪 COVID-19 的 Twitter:自然语言处理管道和探索性数据集。
J Med Internet Res. 2021 Jan 22;23(1):e25314. doi: 10.2196/25314.

本文引用的文献

1
Directions in abusive language training data, a systematic review: Garbage in, garbage out.在辱骂性语言训练数据的方向上,一项系统评价:垃圾进,垃圾出。
PLoS One. 2020 Dec 28;15(12):e0243300. doi: 10.1371/journal.pone.0243300. eCollection 2020.
2
Detection of Hate Speech in COVID-19-Related Tweets in the Arab Region: Deep Learning and Topic Modeling Approach.检测阿拉伯地区与 COVID-19 相关推文的仇恨言论:深度学习和主题建模方法。
J Med Internet Res. 2020 Dec 8;22(12):e22609. doi: 10.2196/22609.
3
Anti-Asian Hate Crime During the COVID-19 Pandemic: Exploring the Reproduction of Inequality.
新冠疫情期间的反亚裔仇恨犯罪:探究不平等的再现
Am J Crim Justice. 2020;45(4):647-667. doi: 10.1007/s12103-020-09545-1. Epub 2020 Jul 7.
4
Hate speech detection: Challenges and solutions.仇恨言论检测:挑战与解决方案。
PLoS One. 2019 Aug 20;14(8):e0221152. doi: 10.1371/journal.pone.0221152. eCollection 2019.
5
Long short-term memory.长短期记忆
Neural Comput. 1997 Nov 15;9(8):1735-80. doi: 10.1162/neco.1997.9.8.1735.