Suppr超能文献

异常值合成少数过采样技术(Outlier-SMOTE):一种用于改进新冠病毒(COVID-19)检测的精细过采样技术。

Outlier-SMOTE: A refined oversampling technique for improved detection of COVID-19.

作者信息

Turlapati Venkata Pavan Kumar, Prusty Manas Ranjan

机构信息

School of Computing, SRM Institute of Science and Technology, Kattankulathur, 603203, India.

Centre for Cyber Physical Systems, Vellore Institute of Technology, Chennai, 600127, India.

出版信息

Intell Based Med. 2020 Dec;3:100023. doi: 10.1016/j.ibmed.2020.100023. Epub 2020 Dec 3.

Abstract

Almost every dataset these days continually faces the predicament of class imbalance. It is difficult to train classifiers on these types of data as they become biased towards a set of classes, hence leading to reduction in classifier performance. This setback is often tackled by the use of various over-sampling or under-sampling algorithms. But, the method which stood out of all the numerous algorithms was the Synthetic Minority Oversampling Technique (SMOTE). SMOTE generates synthetic samples of the minority class by oversampling each data-point by considering linear combinations of existing minority class neighbors. Each minority data sample generates an equal number of synthetic data. As the world is suffering from the plight of COVID-19 pandemic, the authors applied the idea to help boost the classifying performance whilst detecting this deadly virus. This paper presents a modified version of SMOTE known as Outlier-SMOTE wherein each data-point is oversampled with respect to its distance from other data-points. The data-point which is farther than the other data-points is given greater importance and is oversampled more than its counterparts. Outlier-SMOTE reduces the chances of overlapping of minority data samples which often occurs in the traditional SMOTE algorithm. This method is tested on five benchmark datasets and is eventually tested on a COVID-19 dataset. F-measure, Recall and Precision are used as principle metrics to evaluate the performance of the classifier as is the case for any class imbalanced data set. The proposed algorithm performs considerably better than the traditional SMOTE algorithm for the considered datasets.

摘要

如今,几乎每个数据集都持续面临类别不平衡的困境。在这类数据上训练分类器很困难,因为它们会偏向于一组类别,从而导致分类器性能下降。这种挫折通常通过使用各种过采样或欠采样算法来解决。但是,在众多算法中脱颖而出的方法是合成少数类过采样技术(SMOTE)。SMOTE通过考虑现有少数类邻居的线性组合对每个数据点进行过采样,从而生成少数类的合成样本。每个少数数据样本生成相等数量的合成数据。由于世界正遭受新冠疫情的困扰,作者应用这一理念来帮助提高分类性能,同时检测这种致命病毒。本文提出了一种SMOTE的改进版本,称为离群值SMOTE,其中每个数据点根据其与其他数据点的距离进行过采样。比其他数据点距离更远的数据点被赋予更大的重要性,并且比其他数据点过采样更多。离群值SMOTE减少了传统SMOTE算法中经常出现的少数数据样本重叠的可能性。该方法在五个基准数据集上进行了测试,并最终在一个新冠数据集上进行了测试。与任何类别不平衡数据集一样,F值、召回率和精确率被用作评估分类器性能的主要指标。对于所考虑的数据集,所提出的算法比传统SMOTE算法表现得要好得多。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6f13/7710484/72644565ac16/gr1_lrg.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验