Suppr超能文献

设计一种混合降维方法以提高阿姆哈拉语新闻文档分类的性能。

Designing a hybrid dimension reduction for improving the performance of Amharic news document classification.

机构信息

Factuality of computing and Informatics, Jimma institute of technology, Jimma, Ethiopia.

Factuality of computing, Bahir Dar Institute of Technology, Bahir Dar, Ethiopia.

出版信息

PLoS One. 2021 May 21;16(5):e0251902. doi: 10.1371/journal.pone.0251902. eCollection 2021.

Abstract

The volume of Amharic digital documents has grown rapidly in recent years. As a result, automatic document categorization is highly essential. In this paper, we present a novel dimension reduction approach for improving classification accuracy by combining feature selection and feature extraction. The new dimension reduction method utilizes Information Gain (IG), Chi-square test (CHI), and Document Frequency (DF) to select important features and Principal Component Analysis (PCA) to refine the features that have been selected. We evaluate the proposed dimension reduction method with a dataset containing 9 news categories. Our experimental results verified that the proposed dimension reduction method outperforms other methods. Classification accuracy with the new dimension reduction is 92.60%, which is 13.48%, 16.51% and 10.19% higher than with IG, CHI, and DF respectively. Further work is required since classification accuracy still decreases as we reduce the feature size to save computational time.

摘要

近年来,阿姆哈拉语数字文档的数量迅速增长。因此,自动文档分类非常重要。在本文中,我们提出了一种新的降维方法,通过结合特征选择和特征提取来提高分类准确性。新的降维方法利用信息增益 (IG)、卡方检验 (CHI) 和文档频率 (DF) 选择重要特征,利用主成分分析 (PCA) 精炼已选择的特征。我们使用包含 9 个新闻类别的数据集评估了所提出的降维方法。实验结果验证了所提出的降维方法优于其他方法。使用新的降维方法的分类准确率为 92.60%,分别比 IG、CHI 和 DF 高出 13.48%、16.51%和 10.19%。由于为了节省计算时间而减小特征大小会导致分类准确性下降,因此还需要进一步的工作。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/76db/8139506/160b9b9ac2b1/pone.0251902.g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验