• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

阿拉伯语文本分类:对多标签系统的需求。

Arabic text classification: the need for multi-labeling systems.

作者信息

El Rifai Hozayfa, Al Qadi Leen, Elnagar Ashraf

机构信息

Department of Computer Science, University of Sharjah, Sharjah, UAE.

出版信息

Neural Comput Appl. 2022;34(2):1135-1159. doi: 10.1007/s00521-021-06390-z. Epub 2021 Sep 1.

DOI:10.1007/s00521-021-06390-z
PMID:34483495
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8408369/
Abstract

The process of tagging a given text or document with suitable labels is known as text categorization or classification. The aim of this work is to automatically tag a news article based on its vocabulary features. To accomplish this objective, 2 large datasets have been constructed from various Arabic news portals. The first dataset contains of 90k single-labeled articles from 4 domains (Business, Middle East, Technology and Sports). The second dataset has over 290 k multi-tagged articles. To examine the single-label dataset, we employed an array of ten shallow learning classifiers. Furthermore, we added an ensemble model that adopts the majority-voting technique of all studied classifiers. The performance of the classifiers on the first dataset ranged between 87.7% (AdaBoost) and 97.9% (SVM). Analyzing some of the misclassified articles confirmed the need for a multi-label opposed to single-label categorization for better classification results. For the second dataset, we tested both shallow learning and deep learning multi-labeling approaches. A custom accuracy metric, designed for the multi-labeling task, has been developed for performance evaluation along with hamming loss metric. Firstly, we used classifiers that were compatible with multi-labeling tasks such as Logistic Regression and XGBoost, by wrapping each in a OneVsRest classifier. XGBoost gave the higher accuracy, scoring 84.7%, while Logistic Regression scored 81.3%. Secondly, ten neural networks were constructed (CNN, CLSTM, LSTM, BILSTM, GRU, CGRU, BIGRU, HANGRU, CRF-BILSTM and HANLSTM). CGRU proved to be the best multi-labeling classifier scoring an accuracy of 94.85%, higher than the rest of the classifies.

摘要

用合适的标签标记给定文本或文档的过程称为文本分类。这项工作的目的是根据新闻文章的词汇特征自动为其添加标签。为实现这一目标,从各种阿拉伯新闻门户网站构建了两个大型数据集。第一个数据集包含来自4个领域(商业、中东、科技和体育)的90000篇单标签文章。第二个数据集有超过290000篇多标签文章。为了检验单标签数据集,我们使用了一系列十个浅层学习分类器。此外,我们添加了一个采用所有研究分类器的多数投票技术的集成模型。分类器在第一个数据集上的性能介于87.7%(AdaBoost)和97.9%(支持向量机)之间。对一些错误分类的文章进行分析后证实,为了获得更好的分类结果,需要采用多标签而非单标签分类。对于第二个数据集,我们测试了浅层学习和深度学习多标签方法。为了进行性能评估,开发了一种专为多标签任务设计的自定义准确率指标以及汉明损失指标。首先,我们使用了与多标签任务兼容的分类器,如逻辑回归和XGBoost,将它们分别包装在一个一对其余分类器中。XGBoost的准确率更高,得分为84.7%,而逻辑回归的得分为81.3%。其次,构建了十个神经网络(卷积神经网络、卷积长短期记忆网络、长短期记忆网络、双向长短期记忆网络、门控循环单元、卷积门控循环单元、双向门控循环单元、层次注意力网络门控循环单元、条件随机场双向长短期记忆网络和层次注意力网络长短期记忆网络)。卷积门控循环单元被证明是最好的多标签分类器,准确率为94.85%,高于其他分类器。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/58b5/8408369/eb373821be4d/521_2021_6390_Fig14_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/58b5/8408369/757a4c495aea/521_2021_6390_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/58b5/8408369/6ac18d3d7206/521_2021_6390_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/58b5/8408369/48109e50642b/521_2021_6390_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/58b5/8408369/265f2f082b98/521_2021_6390_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/58b5/8408369/15d5476a3d94/521_2021_6390_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/58b5/8408369/c153dbc37ee0/521_2021_6390_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/58b5/8408369/8cba9d79ee27/521_2021_6390_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/58b5/8408369/ca0f8bcdf2c5/521_2021_6390_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/58b5/8408369/104c86c6074a/521_2021_6390_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/58b5/8408369/41d1dd09d931/521_2021_6390_Fig10_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/58b5/8408369/4569c6211cd4/521_2021_6390_Fig11_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/58b5/8408369/b4c64bf18b9e/521_2021_6390_Fig12_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/58b5/8408369/3f6d76b9a306/521_2021_6390_Fig13_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/58b5/8408369/eb373821be4d/521_2021_6390_Fig14_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/58b5/8408369/757a4c495aea/521_2021_6390_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/58b5/8408369/6ac18d3d7206/521_2021_6390_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/58b5/8408369/48109e50642b/521_2021_6390_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/58b5/8408369/265f2f082b98/521_2021_6390_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/58b5/8408369/15d5476a3d94/521_2021_6390_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/58b5/8408369/c153dbc37ee0/521_2021_6390_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/58b5/8408369/8cba9d79ee27/521_2021_6390_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/58b5/8408369/ca0f8bcdf2c5/521_2021_6390_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/58b5/8408369/104c86c6074a/521_2021_6390_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/58b5/8408369/41d1dd09d931/521_2021_6390_Fig10_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/58b5/8408369/4569c6211cd4/521_2021_6390_Fig11_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/58b5/8408369/b4c64bf18b9e/521_2021_6390_Fig12_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/58b5/8408369/3f6d76b9a306/521_2021_6390_Fig13_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/58b5/8408369/eb373821be4d/521_2021_6390_Fig14_HTML.jpg

相似文献

1
Arabic text classification: the need for multi-labeling systems.阿拉伯语文本分类:对多标签系统的需求。
Neural Comput Appl. 2022;34(2):1135-1159. doi: 10.1007/s00521-021-06390-z. Epub 2021 Sep 1.
2
DeBERTa-BiLSTM: A multi-label classification model of Arabic medical questions using pre-trained models and deep learning.基于预训练模型和深度学习的阿拉伯文医学问题多标签分类模型:DeBERTa-BiLSTM
Comput Biol Med. 2024 Mar;170:107921. doi: 10.1016/j.compbiomed.2024.107921. Epub 2024 Jan 4.
3
SANAD: Single-label Arabic News Articles Dataset for automatic text categorization.SANAD:用于自动文本分类的单标签阿拉伯语新闻文章数据集。
Data Brief. 2019 Jun 4;25:104076. doi: 10.1016/j.dib.2019.104076. eCollection 2019 Aug.
4
Co-Labeling for Multi-View Weakly Labeled Learning.多视图弱标签学习的联合标记。
IEEE Trans Pattern Anal Mach Intell. 2016 Jun;38(6):1113-25. doi: 10.1109/TPAMI.2015.2476813. Epub 2015 Sep 4.
5
Building a challenging medical dataset for comparative evaluation of classifier capabilities.构建具有挑战性的医学数据集,用于比较分类器能力。
Comput Biol Med. 2024 Aug;178:108721. doi: 10.1016/j.compbiomed.2024.108721. Epub 2024 Jun 19.
6
GHS-NET a generic hybridized shallow neural network for multi-label biomedical text classification.GHS-NET:一种用于多标签生物医学文本分类的通用混合浅层神经网络。
J Biomed Inform. 2021 Apr;116:103699. doi: 10.1016/j.jbi.2021.103699. Epub 2021 Feb 15.
7
One- and Two-Phase Software Requirement Classification Using Ensemble Deep Learning.使用集成深度学习的一阶段和两阶段软件需求分类
Entropy (Basel). 2021 Sep 28;23(10):1264. doi: 10.3390/e23101264.
8
Multi-label emotion classification of Urdu tweets.乌尔都语推文的多标签情感分类
PeerJ Comput Sci. 2022 Apr 22;8:e896. doi: 10.7717/peerj-cs.896. eCollection 2022.
9
Investigating the impact of pre-processing techniques and pre-trained word embeddings in detecting Arabic health information on social media.研究预处理技术和预训练词嵌入在社交媒体上检测阿拉伯语健康信息方面的影响。
J Big Data. 2021;8(1):95. doi: 10.1186/s40537-021-00488-w. Epub 2021 Jul 2.
10
A novel framework based on the multi-label classification for dynamic selection of classifiers.一种基于多标签分类的用于动态选择分类器的新型框架。
Int J Mach Learn Cybern. 2023;14(6):2137-2154. doi: 10.1007/s13042-022-01751-z. Epub 2023 Jan 2.

引用本文的文献

1
Integrating CNN and transformer architectures for superior Arabic printed and handwriting characters classification.整合卷积神经网络(CNN)和变压器架构以实现卓越的阿拉伯文印刷体和手写体字符分类。
Sci Rep. 2025 Aug 15;15(1):29936. doi: 10.1038/s41598-025-12045-z.
2
Determining the meter of classical Arabic poetry using deep learning: a performance analysis.使用深度学习确定古典阿拉伯诗歌的韵律:性能分析
Front Artif Intell. 2025 Feb 14;8:1523336. doi: 10.3389/frai.2025.1523336. eCollection 2025.
3
A Novel Preoperative Prediction Model Based on Deep Learning to Predict Neoplasm T Staging and Grading in Patients with Upper Tract Urothelial Carcinoma.
一种基于深度学习的新型术前预测模型,用于预测上尿路尿路上皮癌患者的肿瘤T分期和分级。
J Clin Med. 2022 Sep 30;11(19):5815. doi: 10.3390/jcm11195815.
4
Analysing Hate Speech against Migrants and Women through Tweets Using Ensembled Deep Learning Model.通过使用集成深度学习模型分析针对移民和妇女的仇恨言论。
Comput Intell Neurosci. 2022 Apr 10;2022:8153791. doi: 10.1155/2022/8153791. eCollection 2022.