• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

相似文献

1
A dataset of Roman Urdu text with spelling variations for sentence level sentiment analysis.一个用于句子级情感分析的带有拼写变体的罗马乌尔都语文本数据集。
Data Brief. 2024 Nov 23;57:111170. doi: 10.1016/j.dib.2024.111170. eCollection 2024 Dec.
2
Dataset construction to detect human behavior with the help of emotions, sentiments and mood for Roman Urdu.借助情感、情绪和心境来检测乌尔都语(罗马体)人类行为的数据集构建。
Data Brief. 2023 Dec 9;52:109906. doi: 10.1016/j.dib.2023.109906. eCollection 2024 Feb.
3
Multi-class sentiment analysis of urdu text using multilingual BERT.使用多语言 BERT 进行乌尔都语文本的多类情感分析。
Sci Rep. 2022 Mar 31;12(1):5436. doi: 10.1038/s41598-022-09381-9.
4
A hybrid dependency-based approach for Urdu sentiment analysis.一种基于混合依存关系的乌尔都语情感分析方法。
Sci Rep. 2023 Dec 12;13(1):22075. doi: 10.1038/s41598-023-48817-8.
5
Ensemble stacked model for enhanced identification of sentiments from IMDB reviews.用于增强从IMDB评论中识别情感的集成堆叠模型。
Sci Rep. 2025 Apr 18;15(1):13405. doi: 10.1038/s41598-025-97561-8.
6
Depression detection with machine learning of structural and non-structural dual languages.基于结构和非结构双语的机器学习进行抑郁症检测。
Healthc Technol Lett. 2024 Jun 10;11(4):218-226. doi: 10.1049/htl2.12088. eCollection 2024 Aug.
7
Roman Urdu Hate Speech Detection Using Transformer-Based Model for Cyber Security Applications.基于转换器模型的罗曼 Urdu 仇恨言论检测在网络安全应用中的研究
Sensors (Basel). 2023 Apr 12;23(8):3909. doi: 10.3390/s23083909.
8
Sentiment analysis techniques, challenges, and opportunities: Urdu language-based analytical study.情感分析技术、挑战与机遇:基于乌尔都语的分析研究。
PeerJ Comput Sci. 2022 Aug 31;8:e1032. doi: 10.7717/peerj-cs.1032. eCollection 2022.
9
A conditional random field based approach for high-accuracy part-of-speech tagging using language-independent features.一种基于条件随机场的方法,用于使用与语言无关的特征进行高精度词性标注。
PeerJ Comput Sci. 2024 Dec 11;10:e2577. doi: 10.7717/peerj-cs.2577. eCollection 2024.
10
Roman urdu hate speech detection using hybrid machine learning models and hyperparameter optimization.基于混合机器学习模型和超参数优化的罗马 Urdu 仇恨言论检测
Sci Rep. 2024 Nov 19;14(1):28590. doi: 10.1038/s41598-024-79106-7.

一个用于句子级情感分析的带有拼写变体的罗马乌尔都语文本数据集。

A dataset of Roman Urdu text with spelling variations for sentence level sentiment analysis.

作者信息

Soomro Mudasar Ahmed, Memon Rafia Naz, Chandio Asghar Ali, Leghari Mehwish, Soomro Muhammad Hanif

机构信息

Department of Information Technology, Quaid-e-Awam University of Engineering, Science & Technology, Nawabshah, Pakistan.

Department of Software Engineering, Quaid-e-Awam University of Engineering, Science & Technology, Nawabshah, Pakistan.

出版信息

Data Brief. 2024 Nov 23;57:111170. doi: 10.1016/j.dib.2024.111170. eCollection 2024 Dec.

DOI:10.1016/j.dib.2024.111170
PMID:39736901
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11683287/
Abstract

Roman Urdu text is very widespread on many websites. People mostly prefer to give their social comments or product reviews in Roman Urdu, and Roman Urdu is counted as non-standard language. The main reason for this is that there is no rule for word spellings within Roman Urdu words, so people create and post their own word spellings, like "2mro" is a nonstandard spelling for tomorrow. This paper aims to collect two Roman Urdu datasets: one is roman Urdu words with various spelling variations. This dataset contains 5244 Roman Urdu words, within which we have included variations in word spellings ranging from (one) to (five) different spellings for each word. The second dataset consists of Roman Urdu reviews, which were collected from (seven) different internet-based sources. This dataset contains multiclass reviews, namely "very positive," "positive," "very negative," "negative," and "neutral", respectively. We gathered a total of 28,090 reviews. The sentiments of the reviews were made by the domain experts who were familiar with the Urdu language.

摘要

罗马乌尔都语文本在许多网站上非常普遍。人们大多喜欢用罗马乌尔都语发表社交评论或产品评价,而罗马乌尔都语被视为非标准语言。主要原因是罗马乌尔都语单词内没有单词拼写规则,所以人们创造并发布自己的单词拼写,比如“2mro”是“tomorrow”的非标准拼写。本文旨在收集两个罗马乌尔都语数据集:一个是具有各种拼写变体的罗马乌尔都语单词。这个数据集包含5244个罗马乌尔都语单词,其中我们为每个单词纳入了从(一)到(五)种不同拼写的变体。第二个数据集由罗马乌尔都语评论组成,这些评论是从(七个)不同的基于互联网的来源收集的。这个数据集包含多类评论,分别为“非常积极”、“积极”、“非常消极”、“消极”和“中性”。我们总共收集了28090条评论。评论的情感倾向由熟悉乌尔都语的领域专家判定。