Suppr超能文献

使用抽样指标将不良数据语料库转换为精准输出。

Conversion of adverse data corpus to shrewd output using sampling metrics.

作者信息

Ashraf Shahzad, Saleem Sehrish, Ahmed Tauqeer, Aslam Zeeshan, Muhammad Durr

机构信息

College of Internet of Things Engineering, Hohai University, Changzhou, Jiangsu, 210032, China.

Muhammad Nawaz Sharif University of Engineering & Technology, Multan, 66000, Pakistan.

出版信息

Vis Comput Ind Biomed Art. 2020 Aug 11;3(1):19. doi: 10.1186/s42492-020-00055-9.

Abstract

An imbalanced dataset is commonly found in at least one class, which are typically exceeded by the other ones. A machine learning algorithm (classifier) trained with an imbalanced dataset predicts the majority class (frequently occurring) more than the other minority classes (rarely occurring). Training with an imbalanced dataset poses challenges for classifiers; however, applying suitable techniques for reducing class imbalance issues can enhance classifiers' performance. In this study, we consider an imbalanced dataset from an educational context. Initially, we examine all shortcomings regarding the classification of an imbalanced dataset. Then, we apply data-level algorithms for class balancing and compare the performance of classifiers. The performance of the classifiers is measured using the underlying information in their confusion matrices, such as accuracy, precision, recall, and F measure. The results show that classification with an imbalanced dataset may produce high accuracy but low precision and recall for the minority class. The analysis confirms that undersampling and oversampling are effective for balancing datasets, but the latter dominates.

摘要

不平衡数据集通常至少在一个类别中出现,该类别中的样本数量通常少于其他类别。使用不平衡数据集训练的机器学习算法(分类器)对多数类(频繁出现)的预测多于其他少数类(很少出现)。使用不平衡数据集进行训练给分类器带来了挑战;然而,应用适当的技术来减少类不平衡问题可以提高分类器的性能。在本研究中,我们考虑来自教育背景的不平衡数据集。首先,我们研究了关于不平衡数据集分类的所有缺点。然后,我们应用数据级算法进行类平衡,并比较分类器的性能。分类器的性能使用其混淆矩阵中的基础信息来衡量,例如准确率、精确率、召回率和F值。结果表明,使用不平衡数据集进行分类可能会产生较高的准确率,但少数类的精确率和召回率较低。分析证实,欠采样和过采样对于平衡数据集是有效的,但后者更占优势。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a35f/7417470/c0fabc27667c/42492_2020_55_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验