• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

一种识别低资源语言中讽刺意味的自动化方法。

An automated approach to identify sarcasm in low-resource language.

作者信息

Khan Shumaila, Qasim Iqbal, Khan Wahab, Khan Aurangzeb, Ali Khan Javed, Qahmash Ayman, Ghadi Yazeed Yasin

机构信息

Institute of CS & IT, University of Science & Technology, Bannu, Pakistan.

Department of Computer Science, School of Physics, Engineering & Computer Science, University of Hertfordshire, Hatfield, United Kingdom.

出版信息

PLoS One. 2024 Dec 5;19(12):e0307186. doi: 10.1371/journal.pone.0307186. eCollection 2024.

DOI:10.1371/journal.pone.0307186
PMID:39637015
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11620596/
Abstract

Sarcasm detection has emerged due to its applicability in natural language processing (NLP) but lacks substantial exploration in low-resource languages like Urdu, Arabic, Pashto, and Roman-Urdu. While fewer studies identifying sarcasm have focused on low-resource languages, most of the work is in English. This research addresses the gap by exploring the efficacy of diverse machine learning (ML) algorithms in identifying sarcasm in Urdu. The scarcity of annotated datasets for low-resource language becomes a challenge. To overcome the challenge, we curated and released a comparatively large dataset named Urdu Sarcastic Tweets (UST) Dataset, comprising user-generated comments from [Formula: see text] (former Twitter). Automatic sarcasm detection in text involves using computational methods to determine if a given statement is intended to be sarcastic. However, this task is challenging due to the influence of the user's behavior and attitude and their expression of emotions. To address this challenge, we employ various baseline ML classifiers to evaluate their effectiveness in detecting sarcasm in low-resource languages. The primary models evaluated in this study are support vector machine (SVM), decision tree (DT), K-Nearest Neighbor Classifier (K-NN), linear regression (LR), random forest (RF), Naïve Bayes (NB), and XGBoost. Our study's assessment involved validating the performance of these ML classifiers on two distinct datasets-the Tanz-Indicator and the UST dataset. The SVM classifier consistently outperformed other ML models with an accuracy of 0.85 across various experimental setups. This research underscores the importance of tailored sarcasm detection approaches to accommodate specific linguistic characteristics in low-resource languages, paving the way for future investigations. By providing open access to the UST dataset, we encourage its use as a benchmark for sarcasm detection research in similar linguistic contexts.

摘要

由于讽刺检测在自然语言处理(NLP)中的适用性,它已逐渐兴起,但在乌尔都语、阿拉伯语、普什图语和罗马乌尔都语等低资源语言中缺乏实质性的探索。虽然识别讽刺的研究较少关注低资源语言,但大多数工作是用英语进行的。本研究通过探索多种机器学习(ML)算法在识别乌尔都语讽刺言论方面的有效性来填补这一空白。低资源语言注释数据集的稀缺成为一个挑战。为了克服这一挑战,我们精心策划并发布了一个相对较大的数据集,名为乌尔都语讽刺推文(UST)数据集,它包含来自[公式:见文本](前推特)的用户生成评论。文本中的自动讽刺检测涉及使用计算方法来确定给定语句是否意在讽刺。然而,由于用户行为和态度及其情感表达的影响,这项任务具有挑战性。为了应对这一挑战,我们采用各种基线ML分类器来评估它们在检测低资源语言讽刺言论方面的有效性。本研究中评估的主要模型有支持向量机(SVM)、决策树(DT)、K近邻分类器(K-NN)、线性回归(LR)、随机森林(RF)、朴素贝叶斯(NB)和XGBoost。我们研究的评估包括在两个不同的数据集——坦桑尼亚指标数据集和UST数据集上验证这些ML分类器的性能。在各种实验设置中,SVM分类器始终以0.85的准确率优于其他ML模型。这项研究强调了定制讽刺检测方法以适应低资源语言特定语言特征的重要性,为未来的研究铺平了道路。通过提供对UST数据集的开放访问,我们鼓励将其用作类似语言环境中讽刺检测研究的基准。

相似文献

1
An automated approach to identify sarcasm in low-resource language.一种识别低资源语言中讽刺意味的自动化方法。
PLoS One. 2024 Dec 5;19(12):e0307186. doi: 10.1371/journal.pone.0307186. eCollection 2024.
2
Roman urdu hate speech detection using hybrid machine learning models and hyperparameter optimization.基于混合机器学习模型和超参数优化的罗马 Urdu 仇恨言论检测
Sci Rep. 2024 Nov 19;14(1):28590. doi: 10.1038/s41598-024-79106-7.
3
Detection of offensive terms in resource-poor language using machine learning algorithms.使用机器学习算法检测资源匮乏语言中的冒犯性词汇。
PeerJ Comput Sci. 2023 Aug 29;9:e1524. doi: 10.7717/peerj-cs.1524. eCollection 2023.
4
Multi-feature fusion framework for sarcasm identification on twitter data: A machine learning based approach.基于机器学习的多特征融合框架在推特数据中的反讽识别。
PLoS One. 2021 Jun 10;16(6):e0252918. doi: 10.1371/journal.pone.0252918. eCollection 2021.
5
Multi-Rule Based Ensemble Feature Selection Model for Sarcasm Type Detection in Twitter.基于多规则集成特征选择模型的 Twitter 反讽类型检测。
Comput Intell Neurosci. 2020 Jan 9;2020:2860479. doi: 10.1155/2020/2860479. eCollection 2020.
6
Detecting sarcasm in multi-domain datasets using convolutional neural networks and long short term memory network model.使用卷积神经网络和长短期记忆网络模型检测多领域数据集中的讽刺意味。
PeerJ Comput Sci. 2021 Aug 25;7:e645. doi: 10.7717/peerj-cs.645. eCollection 2021.
7
Prediction and feature selection of low birth weight using machine learning algorithms.利用机器学习算法预测和选择低出生体重。
J Health Popul Nutr. 2024 Oct 12;43(1):157. doi: 10.1186/s41043-024-00647-8.
8
Hate speech detection in the Arabic language: corpus design, construction, and evaluation.阿拉伯语中的仇恨言论检测:语料库设计、构建与评估。
Front Artif Intell. 2024 Feb 20;7:1345445. doi: 10.3389/frai.2024.1345445. eCollection 2024.
9
Paraphrase detection for Urdu language text using fine-tune BiLSTM framework.使用微调双向长短期记忆网络(BiLSTM)框架对乌尔都语文本进行释义检测。
Sci Rep. 2025 May 2;15(1):15383. doi: 10.1038/s41598-025-93260-6.
10
Multi-class sentiment analysis of urdu text using multilingual BERT.使用多语言 BERT 进行乌尔都语文本的多类情感分析。
Sci Rep. 2022 Mar 31;12(1):5436. doi: 10.1038/s41598-022-09381-9.

引用本文的文献

1
Paraphrase detection for Urdu language text using fine-tune BiLSTM framework.使用微调双向长短期记忆网络(BiLSTM)框架对乌尔都语文本进行释义检测。
Sci Rep. 2025 May 2;15(1):15383. doi: 10.1038/s41598-025-93260-6.

本文引用的文献

1
Emotion-Semantic-Aware Dual Contrastive Learning for Epistemic Emotion Identification of Learner-Generated Reviews in MOOCs.面向 MOOC 中学习者生成评论的认知情绪识别的情感语义感知双对比学习
IEEE Trans Neural Netw Learn Syst. 2024 Nov;35(11):16464-16477. doi: 10.1109/TNNLS.2023.3294636. Epub 2024 Oct 29.
2
Multi-feature fusion framework for sarcasm identification on twitter data: A machine learning based approach.基于机器学习的多特征融合框架在推特数据中的反讽识别。
PLoS One. 2021 Jun 10;16(6):e0252918. doi: 10.1371/journal.pone.0252918. eCollection 2021.
3
Sarcasm detection in native English and English as a second language speakers.
以英语为母语者和英语作为第二语言者的讽刺话语检测。
Can J Exp Psychol. 2021 Jun;75(2):133-138. doi: 10.1037/cep0000241. Epub 2021 Feb 18.
4
Lexicon-enhanced sentiment analysis framework using rule-based classification scheme.使用基于规则分类方案的词汇增强情感分析框架。
PLoS One. 2017 Feb 23;12(2):e0171649. doi: 10.1371/journal.pone.0171649. eCollection 2017.
5
Sentiment of Emojis.表情符号的情感
PLoS One. 2015 Dec 7;10(12):e0144296. doi: 10.1371/journal.pone.0144296. eCollection 2015.
6
Narcissism, sexual refusal, and aggression: testing a narcissistic reactance model of sexual coercion.自恋、性拒绝与攻击:检验性胁迫的自恋抵抗模型
J Pers Soc Psychol. 2003 May;84(5):1027-40. doi: 10.1037/0022-3514.84.5.1027.