• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

孟加拉语时态:一个按过去、现在和将来时态分类的孟加拉语句子大规模数据集。

BanglaTense: A large-scale dataset of Bangla sentences categorized by tense: Past, present, and future.

作者信息

Bijoy Md Hasan Imam, Ayman Umme, Islam Md Monarul

机构信息

Department of Computer Science and Engineering, Daffodil International University, Dhaka, 1216, Bangladesh.

出版信息

Data Brief. 2025 Feb 19;59:111400. doi: 10.1016/j.dib.2025.111400. eCollection 2025 Apr.

DOI:10.1016/j.dib.2025.111400
PMID:40103760
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11919380/
Abstract

Bengali, an Indo-Aryan language, features a complex grammatical structure with tenses, which is crucial for natural language processing (NLP) applications like text classification, machine translation, and sentiment analysis. The BanglaTense dataset is a large-scale, meticulously curated collection of Bangla sentences categorized by their tense: Past, present, and future. Addressing the resource gap in NLP for the Bangla language, BanglaTense provides a curated resource for Bangla sentence classification, featuring 17,819 annotated sentences, with 5,629 in the past tense, 6,101 in the present tense, and 6,089 in the future tense. This dataset is a benchmark for evaluating NLP models on Bangla sentence classification, promoting linguistic diversity and inclusive language models while ensuring balanced representation across categories. Preprocessing steps are applied to enhance data quality, including anonymization and duplicate removal. Three native Bangla speakers independently assessed the tense labels of the sentences, ensuring the dataset's reliability. BanglaTense is designed to advance research and development in NLP for Bangla, offering valuable applications in tense detection, text classification, language modeling, and educational tools. This dataset supports linguistic study and enhances the development of precise and context-aware NLP models by providing a robust foundation for temporal analysis in Bangla sentences. The dataset is openly available for academic and research purposes, promoting collaboration and innovation within the Bangla NLP community.

摘要

孟加拉语是一种印欧语系语言,具有复杂的时态语法结构,这对于诸如文本分类、机器翻译和情感分析等自然语言处理(NLP)应用至关重要。孟加拉语时态数据集是一个大规模、精心策划的孟加拉语句子集合,根据时态分类:过去时、现在时和将来时。为了解决孟加拉语在NLP方面的资源缺口,孟加拉语时态数据集为孟加拉语句子分类提供了一个精心策划的资源,包含17819个带注释的句子,其中5629个为过去时,6101个为现在时,6089个为将来时。该数据集是评估孟加拉语句子分类NLP模型的基准,促进语言多样性和包容性语言模型,同时确保各类别之间的平衡表示。应用预处理步骤以提高数据质量,包括匿名化和重复数据删除。三位以孟加拉语为母语的人独立评估了句子的时态标签,确保了数据集的可靠性。孟加拉语时态数据集旨在推动孟加拉语NLP的研究与开发,在时态检测、文本分类、语言建模和教育工具方面提供有价值的应用。该数据集通过为孟加拉语句子的时态分析提供坚实基础,支持语言学研究并促进精确和上下文感知NLP模型的开发。该数据集可公开用于学术和研究目的,促进孟加拉语NLP社区内的合作与创新。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8bd3/11919380/2877af38b82c/gr5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8bd3/11919380/e33ad2b82006/gr1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8bd3/11919380/a2120c2fe11b/gr2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8bd3/11919380/2877af38b82c/gr5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8bd3/11919380/e33ad2b82006/gr1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8bd3/11919380/a2120c2fe11b/gr2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8bd3/11919380/2877af38b82c/gr5.jpg

相似文献

1
BanglaTense: A large-scale dataset of Bangla sentences categorized by tense: Past, present, and future.孟加拉语时态:一个按过去、现在和将来时态分类的孟加拉语句子大规模数据集。
Data Brief. 2025 Feb 19;59:111400. doi: 10.1016/j.dib.2025.111400. eCollection 2025 Apr.
2
BTSD: A curated transformation of sentence dataset for text classification in Bangla language.BTSD:孟加拉语用于文本分类的句子数据集的精心整理转换。
Data Brief. 2023 Jul 24;50:109445. doi: 10.1016/j.dib.2023.109445. eCollection 2023 Oct.
3
BanglaBlend: A large-scale nobel dataset of bangla sentences categorized by saint and common form of bangla language.孟加拉语混合语料库:一个大规模的孟加拉语句子诺贝尔奖数据集,按孟加拉语的圣语和通用形式分类。
Data Brief. 2024 Dec 20;58:111240. doi: 10.1016/j.dib.2024.111240. eCollection 2025 Feb.
4
ONUBAD: A comprehensive dataset for automated conversion of Bangla regional dialects into standard Bengali dialect.ONUBAD:一个用于将孟加拉地方方言自动转换为标准孟加拉语方言的综合数据集。
Data Brief. 2025 Jan 6;58:111276. doi: 10.1016/j.dib.2025.111276. eCollection 2025 Feb.
5
Bangla-REX: A distinct dataset for Bangla relation extraction.孟加拉语关系抽取数据集(Bangla-REX):一个用于孟加拉语关系抽取的独特数据集。
Data Brief. 2025 Mar 20;60:111480. doi: 10.1016/j.dib.2025.111480. eCollection 2025 Jun.
6
NOIRBETTIK: A reading comprehension based multiple choice question answering dataset in Bangla language.NOIRBETTIK:一个基于阅读理解的孟加拉语选择题问答数据集。
Data Brief. 2025 Feb 14;59:111395. doi: 10.1016/j.dib.2025.111395. eCollection 2025 Apr.
7
BaitBuster-Bangla: A comprehensive dataset for clickbait detection in Bangla with multi-feature and multi-modal analysis.《诱饵克星-孟加拉语:一个用于孟加拉语标题党检测的综合数据集,具有多特征和多模态分析》
Data Brief. 2024 Feb 27;53:110239. doi: 10.1016/j.dib.2024.110239. eCollection 2024 Apr.
8
ChatgaiyyaAlap: A dataset for conversion from Chittagonian dialect to standard Bangla.Chatgaiyya阿拉普语:一个用于将吉大港方言转换为标准孟加拉语的数据集。
Data Brief. 2025 Feb 21;59:111413. doi: 10.1016/j.dib.2025.111413. eCollection 2025 Apr.
9
Aspect based sentiment analysis datasets for Bangla text.孟加拉语文本的基于方面的情感分析数据集。
Data Brief. 2024 Nov 2;57:111107. doi: 10.1016/j.dib.2024.111107. eCollection 2024 Dec.
10
Sentiment analysis in multilingual context: Comparative analysis of machine learning and hybrid deep learning models.多语言环境下的情感分析:机器学习与混合深度学习模型的比较分析
Heliyon. 2023 Sep 19;9(9):e20281. doi: 10.1016/j.heliyon.2023.e20281. eCollection 2023 Sep.

引用本文的文献

1
BanglaSarc3: A benchmark dataset for Bangla sarcasm detection from social media to advance Bangla NLP.孟加拉语讽刺语料库3:一个用于从社交媒体中检测孟加拉语讽刺以推动孟加拉语自然语言处理的基准数据集。
Data Brief. 2025 Aug 6;62:111953. doi: 10.1016/j.dib.2025.111953. eCollection 2025 Oct.

本文引用的文献

1
BTSD: A curated transformation of sentence dataset for text classification in Bangla language.BTSD:孟加拉语用于文本分类的句子数据集的精心整理转换。
Data Brief. 2023 Jul 24;50:109445. doi: 10.1016/j.dib.2023.109445. eCollection 2023 Oct.
2
BanglaSER: A speech emotion recognition dataset for the Bangla language.孟加拉语SER:一个用于孟加拉语的语音情感识别数据集。
Data Brief. 2022 Mar 22;42:108091. doi: 10.1016/j.dib.2022.108091. eCollection 2022 Jun.