• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

孟加拉语新闻分类器:一种使用混合堆叠分类器对孟加拉语报纸新闻进行分类的机器学习方法。

BanglaNewsClassifier: A machine learning approach for news classification in Bangla Newspapers using hybrid stacking classifiers.

作者信息

Hossain Tanzir, Islam Ar-Rafi, Mehedi Md Humaion Kabir, Rasel Annajiat Alim, Abdullah-Al-Wadud M, Uddin Jia

机构信息

Department of Computer Science and Engineering, BRAC University, Dhaka, Bangladesh.

Department of Software Engineering, College of Computer and Information Sciences, King Saud University, Riyadh, Saudi Arabia.

出版信息

PLoS One. 2025 Jun 9;20(6):e0321291. doi: 10.1371/journal.pone.0321291. eCollection 2025.

DOI:10.1371/journal.pone.0321291
PMID:40489455
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12148137/
Abstract

Bangla news floods the web, and the need for smarter and more efficient classification techniques is greater than ever. Previous studies mostly focused on traditional models, overlooking the potential of hybrid techniques to handle the ever-growing complex dataset and its linguistic patterns in Bangla to achieve higher accuracy. Addressing the challenge, this study presents a comprehensive approach to classify Bangla news articles into eight distinct categories using various machine learning and deep learning techniques. The use of traditional machine learning algorithms, deep learning architectures, and hybrid models, including novel stacking classifiers, was a part of our experiment. This study utilized a dataset of 118,404 Bangla news articles, applying rigorous feature extraction techniques including TF-IDF vectorization and word2Vec embeddings. Our best-performing model, a stacking meta-classifier combining bidirectional long short-term memory and support vector machine, achieved a remarkable 94% accuracy, leaving all basic models' performance behind. Also, we provided an in-depth analysis of model performances, including confusion matrices, ROC curves, and error analysis, offering insights into the strengths and limitations of each approach. This research contributes significantly to the field of Bangla natural language processing and demonstrates the efficacy of ensemble methods and deep learning in news classification for low-resource languages.

摘要

孟加拉语新闻充斥着网络,对更智能、更高效的分类技术的需求比以往任何时候都更加迫切。以往的研究大多集中在传统模型上,忽视了混合技术在处理不断增长的复杂数据集及其孟加拉语语言模式以实现更高准确性方面的潜力。为应对这一挑战,本研究提出了一种综合方法,使用各种机器学习和深度学习技术将孟加拉语新闻文章分类为八个不同的类别。使用传统机器学习算法、深度学习架构以及包括新型堆叠分类器在内的混合模型是我们实验的一部分。本研究利用了一个包含118,404篇孟加拉语新闻文章的数据集,应用了包括TF-IDF向量化和word2Vec嵌入在内的严格特征提取技术。我们表现最佳的模型,即一种结合了双向长短期记忆和支持向量机的堆叠元分类器,达到了94%的显著准确率,超过了所有基本模型的性能。此外,我们还对模型性能进行了深入分析,包括混淆矩阵、ROC曲线和误差分析,深入了解了每种方法的优势和局限性。这项研究对孟加拉语自然语言处理领域做出了重大贡献,并证明了集成方法和深度学习在低资源语言新闻分类中的有效性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d1d0/12148137/03d759fd56d4/pone.0321291.g016.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d1d0/12148137/ebd53ec0512e/pone.0321291.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d1d0/12148137/2287c579ee38/pone.0321291.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d1d0/12148137/ad66db612185/pone.0321291.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d1d0/12148137/f77fa5e70ab5/pone.0321291.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d1d0/12148137/b4d4255ebbb0/pone.0321291.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d1d0/12148137/827e5edbc368/pone.0321291.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d1d0/12148137/00b5312866f4/pone.0321291.g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d1d0/12148137/c95e73a0b617/pone.0321291.g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d1d0/12148137/23173a20d875/pone.0321291.g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d1d0/12148137/4eecf254ddd6/pone.0321291.g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d1d0/12148137/1496522a33fc/pone.0321291.g011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d1d0/12148137/5baa465d5769/pone.0321291.g012.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d1d0/12148137/0853189f7522/pone.0321291.g013.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d1d0/12148137/d39b3751faf4/pone.0321291.g014.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d1d0/12148137/5191d1c587f8/pone.0321291.g015.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d1d0/12148137/03d759fd56d4/pone.0321291.g016.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d1d0/12148137/ebd53ec0512e/pone.0321291.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d1d0/12148137/2287c579ee38/pone.0321291.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d1d0/12148137/ad66db612185/pone.0321291.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d1d0/12148137/f77fa5e70ab5/pone.0321291.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d1d0/12148137/b4d4255ebbb0/pone.0321291.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d1d0/12148137/827e5edbc368/pone.0321291.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d1d0/12148137/00b5312866f4/pone.0321291.g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d1d0/12148137/c95e73a0b617/pone.0321291.g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d1d0/12148137/23173a20d875/pone.0321291.g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d1d0/12148137/4eecf254ddd6/pone.0321291.g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d1d0/12148137/1496522a33fc/pone.0321291.g011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d1d0/12148137/5baa465d5769/pone.0321291.g012.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d1d0/12148137/0853189f7522/pone.0321291.g013.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d1d0/12148137/d39b3751faf4/pone.0321291.g014.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d1d0/12148137/5191d1c587f8/pone.0321291.g015.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d1d0/12148137/03d759fd56d4/pone.0321291.g016.jpg

相似文献

1
BanglaNewsClassifier: A machine learning approach for news classification in Bangla Newspapers using hybrid stacking classifiers.孟加拉语新闻分类器:一种使用混合堆叠分类器对孟加拉语报纸新闻进行分类的机器学习方法。
PLoS One. 2025 Jun 9;20(6):e0321291. doi: 10.1371/journal.pone.0321291. eCollection 2025.
2
Machine and deep learning algorithms for sentiment analysis during COVID-19: A vision to create fake news resistant society.用于COVID-19期间情感分析的机器学习和深度学习算法:创建抵制假新闻社会的愿景。
PLoS One. 2024 Dec 19;19(12):e0315407. doi: 10.1371/journal.pone.0315407. eCollection 2024.
3
Comparison of an Ensemble of Machine Learning Models and the BERT Language Model for Analysis of Text Descriptions of Brain CT Reports to Determine the Presence of Intracranial Hemorrhage.基于机器学习模型集成与 BERT 语言模型的脑 CT 报告文本描述分析用于判断颅内出血的比较研究
Sovrem Tekhnologii Med. 2024;16(1):27-34. doi: 10.17691/stm2024.16.1.03. Epub 2024 Feb 28.
4
TECRR: a benchmark dataset of radiological reports for BI-RADS classification with machine learning, deep learning, and large language model baselines.TECRR:一个基于机器学习、深度学习和大语言模型基线的用于 BI-RADS 分类的放射学报告基准数据集。
BMC Med Inform Decis Mak. 2024 Oct 24;24(1):310. doi: 10.1186/s12911-024-02717-7.
5
Bangla news article dataset.孟加拉语新闻文章数据集。
Data Brief. 2024 Aug 24;57:110874. doi: 10.1016/j.dib.2024.110874. eCollection 2024 Dec.
6
Convolutional neural network-based ensemble methods to recognize Bangla handwritten character.基于卷积神经网络的集成方法用于识别孟加拉语手写字符。
PeerJ Comput Sci. 2021 Jun 28;7:e565. doi: 10.7717/peerj-cs.565. eCollection 2021.
7
Deep Ensemble Fake News Detection Model Using Sequential Deep Learning Technique.基于序列深度学习技术的深度集成假新闻检测模型。
Sensors (Basel). 2022 Sep 15;22(18):6970. doi: 10.3390/s22186970.
8
Sentiment analysis in multilingual context: Comparative analysis of machine learning and hybrid deep learning models.多语言环境下的情感分析:机器学习与混合深度学习模型的比较分析
Heliyon. 2023 Sep 19;9(9):e20281. doi: 10.1016/j.heliyon.2023.e20281. eCollection 2023 Sep.
9
Medical subdomain classification of clinical notes using a machine learning-based natural language processing approach.基于机器学习的自然语言处理方法对临床笔记进行医学子域分类。
BMC Med Inform Decis Mak. 2017 Dec 1;17(1):155. doi: 10.1186/s12911-017-0556-8.
10
Transfer Learning for Sentiment Analysis Using BERT Based Supervised Fine-Tuning.基于 BERT 的有监督微调的情感分析中的迁移学习。
Sensors (Basel). 2022 May 30;22(11):4157. doi: 10.3390/s22114157.

本文引用的文献

1
Adaptive classification of artistic images using multi-scale convolutional neural networks.使用多尺度卷积神经网络对艺术图像进行自适应分类
PeerJ Comput Sci. 2024 Oct 7;10:e2336. doi: 10.7717/peerj-cs.2336. eCollection 2024.
2
A systematic literature review on meta-heuristic based feature selection techniques for text classification.关于基于元启发式算法的文本分类特征选择技术的系统文献综述。
PeerJ Comput Sci. 2024 Jun 12;10:e2084. doi: 10.7717/peerj-cs.2084. eCollection 2024.
3
Topic2features: a novel framework to classify noisy and sparse textual data using LDA topic distributions.
主题2特征:一种使用LDA主题分布对噪声和稀疏文本数据进行分类的新颖框架。
PeerJ Comput Sci. 2021 Aug 11;7:e677. doi: 10.7717/peerj-cs.677. eCollection 2021.