• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

孟加拉语与孟加拉英语:一个用于在语言多样化背景下进行情感检测的单语数据集。

Bengali & Banglish: A monolingual dataset for emotion detection in linguistically diverse contexts.

作者信息

Faisal Moshiur Rahman, Shifa Ashrin Mobashira, Rahman Md Hasibur, Uddin Mohammed Arif, Rahaman Rashedur M

机构信息

Department of Electrical and Computer Engineering, North South University, Dhaka-1229, Bangladesh.

出版信息

Data Brief. 2024 Jul 20;55:110760. doi: 10.1016/j.dib.2024.110760. eCollection 2024 Aug.

DOI:10.1016/j.dib.2024.110760
PMID:39183968
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11342900/
Abstract

The ever-evolving global landscape of communication, driven by Information Technology advancements, underscores the importance of emotion detection in natural language processing. However, challenges persist in interpreting emotions within linguistically diverse contexts, notably in low-resource languages like Bengali, compounded by the emergence of Banglish. To address this gap, we present "Bengali & Banglish," an extensive dataset comprising 80,098 labelled samples across six emotion classes. Our dataset fills a void in fine-grained emotion classification for Bengali and pioneers in emotion detection in Banglish. We achieve significant performance metrics through meticulous annotation and rigorous evaluation, including a weighted F1 score of 71.30% for Bengali and 64.59% for Banglish using BanglaBERT. Also, our dataset facilitates Bengali-to-Banglish Machine Translation, contributing to the advancement of language processing models. Furthermore, our dataset demonstrates a high Cohen's Kappa score of 93.5%, affirming the reliability and consistency of our annotations. This research underscores the importance of linguistic diversity in NLP and provides a valuable resource for enhancing Emotion Detection capabilities in Bengali and Banglish across digital platforms.

摘要

由信息技术进步驱动的不断演变的全球通信格局,凸显了自然语言处理中情感检测的重要性。然而,在语言多样化的背景下解读情感仍存在挑战,尤其是在孟加拉语等资源匮乏的语言中,孟加拉英语的出现更是加剧了这一问题。为了填补这一空白,我们推出了“孟加拉语和孟加拉英语”,这是一个包含80,098个标记样本、涵盖六个情感类别的广泛数据集。我们的数据集填补了孟加拉语细粒度情感分类的空白,并在孟加拉英语情感检测方面开创了先河。我们通过细致的标注和严格的评估取得了显著的性能指标,使用孟加拉语BERT模型时,孟加拉语的加权F1分数为71.30%,孟加拉英语的加权F1分数为64.59%。此外,我们的数据集促进了孟加拉语到孟加拉英语的机器翻译,推动了语言处理模型的发展。此外,我们的数据集展示了高达93.5%的科恩卡帕系数,证实了我们标注的可靠性和一致性。这项研究强调了自然语言处理中语言多样性的重要性,并为增强数字平台上孟加拉语和孟加拉英语的情感检测能力提供了宝贵资源。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9a95/11342900/a579ea4b7944/gr8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9a95/11342900/4e8ff62f6dcd/gr1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9a95/11342900/41302d8a3456/gr2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9a95/11342900/25667b2f2047/gr3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9a95/11342900/970aa4bfde7a/gr4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9a95/11342900/7a1fcd00b91a/gr5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9a95/11342900/32399bf361fd/gr6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9a95/11342900/3778e630f9e3/gr7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9a95/11342900/a579ea4b7944/gr8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9a95/11342900/4e8ff62f6dcd/gr1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9a95/11342900/41302d8a3456/gr2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9a95/11342900/25667b2f2047/gr3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9a95/11342900/970aa4bfde7a/gr4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9a95/11342900/7a1fcd00b91a/gr5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9a95/11342900/32399bf361fd/gr6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9a95/11342900/3778e630f9e3/gr7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9a95/11342900/a579ea4b7944/gr8.jpg

相似文献

1
Bengali & Banglish: A monolingual dataset for emotion detection in linguistically diverse contexts.孟加拉语与孟加拉英语:一个用于在语言多样化背景下进行情感检测的单语数据集。
Data Brief. 2024 Jul 20;55:110760. doi: 10.1016/j.dib.2024.110760. eCollection 2024 Aug.
2
BEmoC: A Corpus for Identifying Emotion in Bengali Texts.BEmoC:一个用于识别孟加拉语文本中情感的语料库。
SN Comput Sci. 2022;3(2):135. doi: 10.1007/s42979-022-01028-w. Epub 2022 Jan 17.
3
Detection of Depression Severity Using Bengali Social Media Posts on Mental Health: Study Using Natural Language Processing Techniques.利用孟加拉语心理健康社交媒体帖子检测抑郁症严重程度:使用自然语言处理技术的研究
JMIR Form Res. 2022 Sep 28;6(9):e36118. doi: 10.2196/36118.
4
An open-source dataset for arabic fine-grained emotion recognition of online content amid COVID-19 pandemic.一个用于在新冠疫情期间对在线内容进行阿拉伯语细粒度情感识别的开源数据集。
Data Brief. 2023 Oct 31;51:109745. doi: 10.1016/j.dib.2023.109745. eCollection 2023 Dec.
5
A comprehensive dataset for sentiment and emotion classification from Bangladesh e-commerce reviews.一个来自孟加拉国电子商务评论的用于情感和情绪分类的综合数据集。
Data Brief. 2024 Jan 14;53:110052. doi: 10.1016/j.dib.2024.110052. eCollection 2024 Apr.
6
BengSentiLex and BengSwearLex: creating lexicons for sentiment analysis and profanity detection in low-resource Bengali language.孟加拉语情感词典和孟加拉语脏话词典:为低资源孟加拉语的情感分析和亵渎检测创建词汇表。
PeerJ Comput Sci. 2021 Nov 16;7:e681. doi: 10.7717/peerj-cs.681. eCollection 2021.
7
A novel Data and Model Centric artificial intelligence based approach in developing high-performance Named Entity Recognition for Bengali Language.一种基于数据和模型为中心的人工智能方法,用于开发高性能的孟加拉语命名实体识别。
PLoS One. 2023 Sep 22;18(9):e0287818. doi: 10.1371/journal.pone.0287818. eCollection 2023.
8
KU-BdSL: An open dataset for Bengali sign language recognition.KU-BdSL:一个用于孟加拉语手语识别的开放数据集。
Data Brief. 2023 Nov 11;51:109797. doi: 10.1016/j.dib.2023.109797. eCollection 2023 Dec.
9
Balinese story texts dataset for narrative text analyses.用于叙事文本分析的巴厘岛故事文本数据集。
Data Brief. 2024 Aug 8;56:110781. doi: 10.1016/j.dib.2024.110781. eCollection 2024 Oct.
10
A transformer-based generative adversarial learning to detect sarcasm from Bengali text with correct classification of confusing text.一种基于Transformer的生成对抗学习,用于从孟加拉语文本中检测讽刺意味并对混淆文本进行正确分类。
Heliyon. 2023 Nov 18;9(12):e22531. doi: 10.1016/j.heliyon.2023.e22531. eCollection 2023 Dec.

本文引用的文献

1
Early processing of the six basic facial emotional expressions.六种基本面部情绪表情的早期加工
Brain Res Cogn Brain Res. 2003 Oct;17(3):613-20. doi: 10.1016/s0926-6410(03)00174-5.