Suppr超能文献

mCRF和mRD:基于新型多类标签噪声过滤学习框架的两种分类方法。

mCRF and mRD: Two Classification Methods Based on a Novel Multiclass Label Noise Filtering Learning Framework.

作者信息

Xia Shuyin, Chen Baiyun, Wang Guoyin, Zheng Yong, Gao Xinbo, Giem Elisabeth, Chen Zizhong

出版信息

IEEE Trans Neural Netw Learn Syst. 2022 Jul;33(7):2916-2930. doi: 10.1109/TNNLS.2020.3047046. Epub 2022 Jul 6.

Abstract

Mitigating label noise is a crucial problem in classification. Noise filtering is an effective method of dealing with label noise which does not need to estimate the noise rate or rely on any loss function. However, most filtering methods focus mainly on binary classification, leaving the more difficult counterpart problem of multiclass classification relatively unexplored. To remedy this deficit, we present a definition for label noise in a multiclass setting and propose a general framework for a novel label noise filtering learning method for multiclass classification. Two examples of noise filtering methods for multiclass classification, multiclass complete random forest (mCRF) and multiclass relative density, are derived from their binary counterparts using our proposed framework. In addition, to optimize the NI_threshold hyperparameter in mCRF, we propose two new optimization methods: a new voting cross-validation method and an adaptive method that employs a 2-means clustering algorithm. Furthermore, we incorporate SMOTE into our label noise filtering learning framework to handle the ubiquitous problem of imbalanced data in multiclass classification. We report experiments on both synthetic data sets and UCI benchmarks to demonstrate our proposed methods are highly robust to label noise in comparison with state-of-the-art baselines. All code and data results are available at https://github.com/syxiaa/Multiclass-Label-Noise-Filtering-Learning.

摘要

减轻标签噪声是分类中的一个关键问题。噪声过滤是一种处理标签噪声的有效方法,它不需要估计噪声率或依赖任何损失函数。然而,大多数过滤方法主要集中在二分类上,而多分类中更具挑战性的对应问题相对较少被探索。为了弥补这一不足,我们给出了多分类设置下标签噪声的定义,并提出了一种用于多分类的新型标签噪声过滤学习方法的通用框架。基于我们提出的框架,从其二分类对应方法中推导出了两种多分类噪声过滤方法的示例,即多分类完全随机森林(mCRF)和多分类相对密度。此外,为了优化mCRF中的NI_threshold超参数,我们提出了两种新的优化方法:一种新的投票交叉验证方法和一种采用二均值聚类算法的自适应方法。此外,我们将SMOTE纳入我们的标签噪声过滤学习框架,以处理多分类中普遍存在的不平衡数据问题。我们在合成数据集和UCI基准上进行了实验,以证明与现有最先进的基线相比,我们提出的方法对标签噪声具有高度鲁棒性。所有代码和数据结果可在https://github.com/syxiaa/Multiclass-Label-Noise-Filtering-Learning获取。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验