• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

乌尔都语推文的多标签情感分类

Multi-label emotion classification of Urdu tweets.

作者信息

Ashraf Noman, Khan Lal, Butt Sabur, Chang Hsien-Tsung, Sidorov Grigori, Gelbukh Alexander

机构信息

CIC, Instituto Politécnico Nacional, Mexico City, Mexico.

Department of Computer Science and Information Engineering, Chang Gung University, Taoyuan, Taiwan.

出版信息

PeerJ Comput Sci. 2022 Apr 22;8:e896. doi: 10.7717/peerj-cs.896. eCollection 2022.

DOI:10.7717/peerj-cs.896
PMID:35494831
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9044368/
Abstract

Urdu is a widely used language in South Asia and worldwide. While there are similar datasets available in English, we created the first multi-label emotion dataset consisting of 6,043 tweets and six basic emotions in the Urdu Nastalíq script. A multi-label (ML) classification approach was adopted to detect emotions from Urdu. The morphological and syntactic structure of Urdu makes it a challenging problem for multi-label emotion detection. In this paper, we build a set of baseline classifiers such as machine learning algorithms (Random forest (RF), Decision tree (J48), Sequential minimal optimization (SMO), AdaBoostM1, and Bagging), deep-learning algorithms (Convolutional Neural Networks (1D-CNN), Long short-term memory (LSTM), and LSTM with CNN features) and transformer-based baseline (BERT). We used a combination of text representations: stylometric-based features, pre-trained word embedding, word-based n-grams, and character-based n-grams. The paper highlights the annotation guidelines, dataset characteristics and insights into different methodologies used for Urdu based emotion classification. We present our best results using micro-averaged F1, macro-averaged F1, accuracy, Hamming loss (HL) and exact match (EM) for all tested methods.

摘要

乌尔都语在南亚及全球范围内广泛使用。虽然有类似的英文数据集,但我们创建了首个多标签情感数据集,该数据集由6043条推文组成,采用乌尔都纳斯塔利克字体书写,并包含六种基本情感。我们采用多标签(ML)分类方法来检测乌尔都语中的情感。乌尔都语的形态和句法结构使其成为多标签情感检测中的一个具有挑战性的问题。在本文中,我们构建了一组基线分类器,如机器学习算法(随机森林(RF)、决策树(J48)、序列最小优化(SMO)、AdaBoostM1和Bagging)、深度学习算法(一维卷积神经网络(1D-CNN)、长短期记忆网络(LSTM)以及具有CNN特征的LSTM)和基于Transformer的基线(BERT)。我们使用了多种文本表示方法的组合:基于文体特征的特征、预训练词嵌入、基于词的n元语法和基于字符的n元语法。本文重点介绍了注释指南、数据集特征以及对用于乌尔都语情感分类的不同方法的见解。我们给出了所有测试方法在微平均F1、宏平均F1、准确率、汉明损失(HL)和精确匹配(EM)方面的最佳结果。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5499/9044368/b8e4aea71dad/peerj-cs-08-896-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5499/9044368/30c8bf053e49/peerj-cs-08-896-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5499/9044368/3e0d42bbfe03/peerj-cs-08-896-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5499/9044368/14384b3b0da3/peerj-cs-08-896-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5499/9044368/b8e4aea71dad/peerj-cs-08-896-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5499/9044368/30c8bf053e49/peerj-cs-08-896-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5499/9044368/3e0d42bbfe03/peerj-cs-08-896-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5499/9044368/14384b3b0da3/peerj-cs-08-896-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5499/9044368/b8e4aea71dad/peerj-cs-08-896-g004.jpg

相似文献

1
Multi-label emotion classification of Urdu tweets.乌尔都语推文的多标签情感分类
PeerJ Comput Sci. 2022 Apr 22;8:e896. doi: 10.7717/peerj-cs.896. eCollection 2022.
2
Multi-class sentiment analysis of urdu text using multilingual BERT.使用多语言 BERT 进行乌尔都语文本的多类情感分析。
Sci Rep. 2022 Mar 31;12(1):5436. doi: 10.1038/s41598-022-09381-9.
3
Roman Urdu Hate Speech Detection Using Transformer-Based Model for Cyber Security Applications.基于转换器模型的罗曼 Urdu 仇恨言论检测在网络安全应用中的研究
Sensors (Basel). 2023 Apr 12;23(8):3909. doi: 10.3390/s23083909.
4
Cursive-Text: A Comprehensive Dataset for End-to-End Urdu Text Recognition in Natural Scene Images.连笔文本:用于自然场景图像中乌尔都语文本端到端识别的综合数据集。
Data Brief. 2020 May 21;31:105749. doi: 10.1016/j.dib.2020.105749. eCollection 2020 Aug.
5
An efficient method for disaster tweets classification using gradient-based optimized convolutional neural networks with BERT embeddings.一种使用基于梯度优化的卷积神经网络与BERT嵌入的高效灾难推文分类方法。
MethodsX. 2024 Jul 3;13:102843. doi: 10.1016/j.mex.2024.102843. eCollection 2024 Dec.
6
DeBERTa-BiLSTM: A multi-label classification model of Arabic medical questions using pre-trained models and deep learning.基于预训练模型和深度学习的阿拉伯文医学问题多标签分类模型:DeBERTa-BiLSTM
Comput Biol Med. 2024 Mar;170:107921. doi: 10.1016/j.compbiomed.2024.107921. Epub 2024 Jan 4.
7
SentiUrdu-1M: A large-scale tweet dataset for Urdu text sentiment analysis using weakly supervised learning.SentiUrdu-1M:一种使用弱监督学习的大规模推文数据集,用于乌尔都语文本情感分析。
PLoS One. 2023 Aug 30;18(8):e0290779. doi: 10.1371/journal.pone.0290779. eCollection 2023.
8
A comparative study on deep learning models for text classification of unstructured medical notes with various levels of class imbalance.深度学习模型在不同类别不平衡程度的非结构化医疗记录文本分类中的对比研究。
BMC Med Res Methodol. 2022 Jul 2;22(1):181. doi: 10.1186/s12874-022-01665-y.
9
Fake news detection in Urdu language using machine learning.使用机器学习进行乌尔都语假新闻检测。
PeerJ Comput Sci. 2023 May 23;9:e1353. doi: 10.7717/peerj-cs.1353. eCollection 2023.
10
EEG-based emotion charting for Parkinson's disease patients using Convolutional Recurrent Neural Networks and cross dataset learning.基于 EEG 的帕金森病患者情绪图表分析,使用卷积循环神经网络和跨数据集学习。
Comput Biol Med. 2022 May;144:105327. doi: 10.1016/j.compbiomed.2022.105327. Epub 2022 Mar 11.

引用本文的文献

1
Evolving techniques in sentiment analysis: a comprehensive review.情感分析中的技术演进:全面综述
PeerJ Comput Sci. 2025 Jan 28;11:e2592. doi: 10.7717/peerj-cs.2592. eCollection 2025.
2
Roman urdu hate speech detection using hybrid machine learning models and hyperparameter optimization.基于混合机器学习模型和超参数优化的罗马 Urdu 仇恨言论检测
Sci Rep. 2024 Nov 19;14(1):28590. doi: 10.1038/s41598-024-79106-7.
3
Terrorism group prediction using feature combination and BiGRU with self-attention mechanism.基于特征组合和带自注意力机制的双向门控循环单元的恐怖主义组织预测

本文引用的文献

1
Abusive language detection in youtube comments leveraging replies as conversational context.利用回复作为对话上下文来检测YouTube评论中的辱骂性语言。
PeerJ Comput Sci. 2021 Oct 8;7:e742. doi: 10.7717/peerj-cs.742. eCollection 2021.
2
Clarifying the Conceptualization, Dimensionality, and Structure of Emotion: Response to Barrett and Colleagues.澄清情感的概念化、维度和结构:对 Barrett 及其同事的回应。
Trends Cogn Sci. 2018 Apr;22(4):274-276. doi: 10.1016/j.tics.2018.02.003. Epub 2018 Feb 21.
3
Nature of Emotion Categories: Comment on Cowen and Keltner.
PeerJ Comput Sci. 2024 Sep 20;10:e2252. doi: 10.7717/peerj-cs.2252. eCollection 2024.
4
A model for identifying potentially inappropriate medication used in older people with dementia: a machine learning study.用于识别痴呆老年人中潜在不适当用药的模型:一项机器学习研究。
Int J Clin Pharm. 2024 Aug;46(4):937-946. doi: 10.1007/s11096-024-01730-0. Epub 2024 Jul 9.
5
Migraine headache (MH) classification using machine learning methods with data augmentation.使用机器学习方法并结合数据增强技术进行偏头痛(MH)分类。
Sci Rep. 2024 Mar 2;14(1):5180. doi: 10.1038/s41598-024-55874-0.
6
Developing a Warning Model of Potentially Inappropriate Medications in Older Chinese Outpatients in Tertiary Hospitals: A Machine-Learning Study.建立三级医院老年中国门诊患者潜在不适当用药预警模型:一项机器学习研究
J Clin Med. 2023 Mar 30;12(7):2619. doi: 10.3390/jcm12072619.
7
Sentiment analysis of vegan related tweets using mutual information for feature selection.使用互信息进行特征选择的纯素相关推文情感分析。
PeerJ Comput Sci. 2022 Dec 5;8:e1149. doi: 10.7717/peerj-cs.1149. eCollection 2022.
情绪类别本质:考恩和凯尔特纳的评论。
Trends Cogn Sci. 2018 Feb;22(2):97-99. doi: 10.1016/j.tics.2017.12.004. Epub 2018 Jan 16.
4
Norms of valence, arousal, and dominance for 13,915 English lemmas.13915 个英语词汇的效价、唤醒度和支配度的常模。
Behav Res Methods. 2013 Dec;45(4):1191-207. doi: 10.3758/s13428-012-0314-x.
5
Long short-term memory.长短期记忆
Neural Comput. 1997 Nov 15;9(8):1735-80. doi: 10.1162/neco.1997.9.8.1735.