• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

设计一种混合降维方法以提高阿姆哈拉语新闻文档分类的性能。

Designing a hybrid dimension reduction for improving the performance of Amharic news document classification.

机构信息

Factuality of computing and Informatics, Jimma institute of technology, Jimma, Ethiopia.

Factuality of computing, Bahir Dar Institute of Technology, Bahir Dar, Ethiopia.

出版信息

PLoS One. 2021 May 21;16(5):e0251902. doi: 10.1371/journal.pone.0251902. eCollection 2021.

DOI:10.1371/journal.pone.0251902
PMID:34019571
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8139506/
Abstract

The volume of Amharic digital documents has grown rapidly in recent years. As a result, automatic document categorization is highly essential. In this paper, we present a novel dimension reduction approach for improving classification accuracy by combining feature selection and feature extraction. The new dimension reduction method utilizes Information Gain (IG), Chi-square test (CHI), and Document Frequency (DF) to select important features and Principal Component Analysis (PCA) to refine the features that have been selected. We evaluate the proposed dimension reduction method with a dataset containing 9 news categories. Our experimental results verified that the proposed dimension reduction method outperforms other methods. Classification accuracy with the new dimension reduction is 92.60%, which is 13.48%, 16.51% and 10.19% higher than with IG, CHI, and DF respectively. Further work is required since classification accuracy still decreases as we reduce the feature size to save computational time.

摘要

近年来,阿姆哈拉语数字文档的数量迅速增长。因此,自动文档分类非常重要。在本文中,我们提出了一种新的降维方法,通过结合特征选择和特征提取来提高分类准确性。新的降维方法利用信息增益 (IG)、卡方检验 (CHI) 和文档频率 (DF) 选择重要特征,利用主成分分析 (PCA) 精炼已选择的特征。我们使用包含 9 个新闻类别的数据集评估了所提出的降维方法。实验结果验证了所提出的降维方法优于其他方法。使用新的降维方法的分类准确率为 92.60%,分别比 IG、CHI 和 DF 高出 13.48%、16.51%和 10.19%。由于为了节省计算时间而减小特征大小会导致分类准确性下降,因此还需要进一步的工作。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/76db/8139506/7e20e96a6c9b/pone.0251902.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/76db/8139506/160b9b9ac2b1/pone.0251902.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/76db/8139506/5a0d70b80725/pone.0251902.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/76db/8139506/0e06aad2e473/pone.0251902.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/76db/8139506/7e20e96a6c9b/pone.0251902.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/76db/8139506/160b9b9ac2b1/pone.0251902.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/76db/8139506/5a0d70b80725/pone.0251902.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/76db/8139506/0e06aad2e473/pone.0251902.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/76db/8139506/7e20e96a6c9b/pone.0251902.g004.jpg

相似文献

1
Designing a hybrid dimension reduction for improving the performance of Amharic news document classification.设计一种混合降维方法以提高阿姆哈拉语新闻文档分类的性能。
PLoS One. 2021 May 21;16(5):e0251902. doi: 10.1371/journal.pone.0251902. eCollection 2021.
2
Feature selection by integrating document frequency with genetic algorithm for Amharic news document classification.通过将文档频率与遗传算法相结合进行阿姆哈拉语文本分类的特征选择
PeerJ Comput Sci. 2022 Apr 25;8:e961. doi: 10.7717/peerj-cs.961. eCollection 2022.
3
A L1-regularized feature selection method for local dimension reduction on microarray data.基于微阵列数据的局部降维的 L1 正则化特征选择方法。
Comput Biol Chem. 2017 Apr;67:92-101. doi: 10.1016/j.compbiolchem.2016.12.010. Epub 2016 Dec 31.
4
Improving fake news classification using dependency grammar.利用依存语法提高假新闻分类。
PLoS One. 2021 Sep 14;16(9):e0256940. doi: 10.1371/journal.pone.0256940. eCollection 2021.
5
Automated Amharic News Categorization Using Deep Learning Models.基于深度学习模型的阿姆哈拉语新闻自动分类。
Comput Intell Neurosci. 2021 Jul 27;2021:3774607. doi: 10.1155/2021/3774607. eCollection 2021.
6
A PCA aided cross-covariance scheme for discriminative feature extraction from EEG signals.基于主成分分析的脑电信号判别特征提取的互协方差方法。
Comput Methods Programs Biomed. 2017 Jul;146:47-57. doi: 10.1016/j.cmpb.2017.05.009. Epub 2017 May 24.
7
A hybrid system based on information gain and principal component analysis for the classification of transcranial Doppler signals.基于信息增益和主成分分析的经颅多普勒信号分类的混合系统。
Comput Methods Programs Biomed. 2012 Sep;107(3):598-609. doi: 10.1016/j.cmpb.2011.03.013. Epub 2011 Apr 27.
8
Normalized effect size (NES): a novel feature selection model for Urdu fake news classification.归一化效应大小(NES):一种用于乌尔都语假新闻分类的新型特征选择模型。
PeerJ Comput Sci. 2023 Oct 24;9:e1612. doi: 10.7717/peerj-cs.1612. eCollection 2023.
9
Designing a robust feature extraction method based on optimum allocation and principal component analysis for epileptic EEG signal classification.基于最优分配和主成分分析的癫痫脑电信号分类稳健特征提取方法设计。
Comput Methods Programs Biomed. 2015 Apr;119(1):29-42. doi: 10.1016/j.cmpb.2015.01.002. Epub 2015 Jan 30.
10
Feature selection in gene expression data using principal component analysis and rough set theory.基于主成分分析和粗糙集理论的基因表达数据特征选择。
Adv Exp Med Biol. 2011;696:91-100. doi: 10.1007/978-1-4419-7046-6_10.

引用本文的文献

1
An enhanced adaptive dynamic metaheuristic optimization algorithm for rainfall prediction depends on long short-term memory.一种基于长短期记忆的用于降雨预测的增强自适应动态元启发式优化算法。
PLoS One. 2025 Jun 2;20(6):e0317554. doi: 10.1371/journal.pone.0317554. eCollection 2025.
2
Feature selection by integrating document frequency with genetic algorithm for Amharic news document classification.通过将文档频率与遗传算法相结合进行阿姆哈拉语文本分类的特征选择
PeerJ Comput Sci. 2022 Apr 25;8:e961. doi: 10.7717/peerj-cs.961. eCollection 2022.