• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

通过将文档频率与遗传算法相结合进行阿姆哈拉语文本分类的特征选择

Feature selection by integrating document frequency with genetic algorithm for Amharic news document classification.

作者信息

Endalie Demeke, Haile Getamesay, Taye Abebe Wondmagegn

机构信息

Faculty of Computing and Informatics, Jimma Institute of Technology, Jimma, Oromia, Ethiopia.

Faculty of Civil and Environmental Engineering, Jimma Institute of Technology, Jimma, Oromia, Ethiopia.

出版信息

PeerJ Comput Sci. 2022 Apr 25;8:e961. doi: 10.7717/peerj-cs.961. eCollection 2022.

DOI:10.7717/peerj-cs.961
PMID:35634124
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9137894/
Abstract

Text classification is the process of categorizing documents based on their content into a predefined set of categories. Text classification algorithms typically represent documents as collections of words and it deals with a large number of features. The selection of appropriate features becomes important when the initial feature set is quite large. In this paper, we present a hybrid of document frequency (DF) and genetic algorithm (GA)-based feature selection method for Amharic text classification. We evaluate this feature selection method on Amharic news documents obtained from the Ethiopian News Agency (ENA). The number of categories used in this study is 13. Our experimental results showed that the proposed feature selection method outperformed other feature selection methods utilized for Amharic news document classification. Combining the proposed feature selection method with Extra Tree Classifier (ETC) improves classification accuracy. It improves classification accuracy up to 1% higher than the hybrid of DF, information gain (IG), chi-square (CHI), and principal component analysis (PCA), 2.47% greater than GA and 3.86% greater than a hybrid of DF, IG, and CHI.

摘要

文本分类是指根据文档内容将其归类到一组预定义类别的过程。文本分类算法通常将文档表示为单词集合,并且要处理大量特征。当初始特征集非常大时,选择合适的特征就变得很重要。在本文中,我们提出了一种基于文档频率(DF)和遗传算法(GA)的混合特征选择方法,用于阿姆哈拉语文本分类。我们在从埃塞俄比亚通讯社(ENA)获取的阿姆哈拉语新闻文档上评估了这种特征选择方法。本研究中使用的类别数量为13个。我们的实验结果表明,所提出的特征选择方法优于用于阿姆哈拉语新闻文档分类的其他特征选择方法。将所提出的特征选择方法与极端随机树分类器(ETC)相结合可提高分类准确率。它比DF、信息增益(IG)、卡方检验(CHI)和主成分分析(PCA)的混合方法提高分类准确率高达1%,比GA高2.47%,比DF、IG和CHI的混合方法高3.86%。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8721/9137894/24e5b5b1fbd1/peerj-cs-08-961-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8721/9137894/795e1982f640/peerj-cs-08-961-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8721/9137894/768c73733923/peerj-cs-08-961-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8721/9137894/24e5b5b1fbd1/peerj-cs-08-961-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8721/9137894/795e1982f640/peerj-cs-08-961-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8721/9137894/768c73733923/peerj-cs-08-961-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8721/9137894/24e5b5b1fbd1/peerj-cs-08-961-g003.jpg

相似文献

1
Feature selection by integrating document frequency with genetic algorithm for Amharic news document classification.通过将文档频率与遗传算法相结合进行阿姆哈拉语文本分类的特征选择
PeerJ Comput Sci. 2022 Apr 25;8:e961. doi: 10.7717/peerj-cs.961. eCollection 2022.
2
Designing a hybrid dimension reduction for improving the performance of Amharic news document classification.设计一种混合降维方法以提高阿姆哈拉语新闻文档分类的性能。
PLoS One. 2021 May 21;16(5):e0251902. doi: 10.1371/journal.pone.0251902. eCollection 2021.
3
Automated Amharic News Categorization Using Deep Learning Models.基于深度学习模型的阿姆哈拉语新闻自动分类。
Comput Intell Neurosci. 2021 Jul 27;2021:3774607. doi: 10.1155/2021/3774607. eCollection 2021.
4
Prediction of cause of death from forensic autopsy reports using text classification techniques: A comparative study.使用文本分类技术从法医尸检报告预测死亡原因:一项比较研究。
J Forensic Leg Med. 2018 Jul;57:41-50. doi: 10.1016/j.jflm.2017.07.001. Epub 2017 Jul 4.
5
Normalized effect size (NES): a novel feature selection model for Urdu fake news classification.归一化效应大小(NES):一种用于乌尔都语假新闻分类的新型特征选择模型。
PeerJ Comput Sci. 2023 Oct 24;9:e1612. doi: 10.7717/peerj-cs.1612. eCollection 2023.
6
Relevance popularity: A term event model based feature selection scheme for text classification.相关性流行度:一种基于术语事件模型的文本分类特征选择方案。
PLoS One. 2017 Apr 5;12(4):e0174341. doi: 10.1371/journal.pone.0174341. eCollection 2017.
7
Feature selection using regularized neighbourhood component analysis to enhance the classification performance of motor imagery signals.使用正则化邻域成分分析进行特征选择,以提高运动想象信号的分类性能。
Comput Biol Med. 2019 Apr;107:118-126. doi: 10.1016/j.compbiomed.2019.02.009. Epub 2019 Feb 19.
8
Generalized Term Similarity for Feature Selection in Text Classification Using Quadratic Programming.基于二次规划的文本分类特征选择中的广义术语相似度
Entropy (Basel). 2020 Mar 30;22(4):395. doi: 10.3390/e22040395.
9
Computer-assisted lip diagnosis on Traditional Chinese Medicine using multi-class support vector machines.基于多类支持向量机的中医唇诊计算机辅助诊断。
BMC Complement Altern Med. 2012 Aug 16;12:127. doi: 10.1186/1472-6882-12-127.
10
Computer-aided diagnosis of pulmonary nodules using a two-step approach for feature selection and classifier ensemble construction.使用两步特征选择和分类器集成构建方法进行肺结节计算机辅助诊断。
Artif Intell Med. 2010 Sep;50(1):43-53. doi: 10.1016/j.artmed.2010.04.011. Epub 2010 May 31.

引用本文的文献

1
MSBKA: A Multi-Strategy Improved Black-Winged Kite Algorithm for Feature Selection of Natural Disaster Tweets Classification.MSBKA:一种用于自然灾害推文分类特征选择的多策略改进黑翅鸢算法
Biomimetics (Basel). 2025 Jan 10;10(1):41. doi: 10.3390/biomimetics10010041.
2
Deep learning-based idiomatic expression recognition for the Amharic language.基于深度学习的阿姆哈拉语惯用表达识别。
PLoS One. 2023 Dec 14;18(12):e0295339. doi: 10.1371/journal.pone.0295339. eCollection 2023.
3
Analysis of lung cancer risk factors from medical records in Ethiopia using machine learning.

本文引用的文献

1
Automated Amharic News Categorization Using Deep Learning Models.基于深度学习模型的阿姆哈拉语新闻自动分类。
Comput Intell Neurosci. 2021 Jul 27;2021:3774607. doi: 10.1155/2021/3774607. eCollection 2021.
2
Designing a hybrid dimension reduction for improving the performance of Amharic news document classification.设计一种混合降维方法以提高阿姆哈拉语新闻文档分类的性能。
PLoS One. 2021 May 21;16(5):e0251902. doi: 10.1371/journal.pone.0251902. eCollection 2021.
利用机器学习分析埃塞俄比亚医疗记录中的肺癌风险因素。
PLOS Digit Health. 2023 Jul 19;2(7):e0000308. doi: 10.1371/journal.pdig.0000308. eCollection 2023 Jul.