Suppr超能文献

针对噪声标签的渐进式随机学习

Progressive Stochastic Learning for Noisy Labels.

作者信息

Tsang Ivor W, Yu Celina P

出版信息

IEEE Trans Neural Netw Learn Syst. 2018 Oct;29(10):5136-5148. doi: 10.1109/TNNLS.2018.2792062. Epub 2018 Feb 5.

Abstract

Large-scale learning problems require a plethora of labels that can be efficiently collected from crowdsourcing services at low cost. However, labels annotated by crowdsourced workers are often noisy, which inevitably degrades the performance of large-scale optimizations including the prevalent stochastic gradient descent (SGD). Specifically, these noisy labels adversely affect updates of the primal variable in conventional SGD. To solve this challenge, we propose a robust SGD mechanism called progressive stochastic learning (POSTAL), which naturally integrates the learning regime of curriculum learning (CL) with the update process of vanilla SGD. Our inspiration comes from the progressive learning process of CL, namely learning from "easy" tasks to "complex" tasks. Through the robust learning process of CL, POSTAL aims to yield robust updates of the primal variable on an ordered label sequence, namely, from "reliable" labels to "noisy" labels. To realize POSTAL mechanism, we design a cluster of "screening losses," which sorts all labels from the reliable region to the noisy region. To sum up, POSTAL using screening losses ensures robust updates of the primal variable on reliable labels first, then on noisy labels incrementally until convergence. In theory, we derive the convergence rate of POSTAL realized by screening losses. Meanwhile, we provide the robustness analysis of representative screening losses. Experimental results on UCI1 simulated and Amazon Mechanical Turk crowdsourcing data sets show that the POSTAL using screening losses is more effective and robust than several existing baselines.1UCI is the abbreviation of University of California Irvine.

摘要

大规模学习问题需要大量的标签,这些标签可以通过众包服务以低成本高效收集。然而,众包工作者标注的标签往往存在噪声,这不可避免地会降低包括流行的随机梯度下降(SGD)在内的大规模优化的性能。具体来说,这些噪声标签会对传统SGD中原始变量的更新产生不利影响。为了解决这一挑战,我们提出了一种名为渐进随机学习(POSTAL)的鲁棒SGD机制,该机制自然地将课程学习(CL)的学习方式与普通SGD的更新过程相结合。我们的灵感来自于CL的渐进学习过程,即从“简单”任务到“复杂”任务的学习。通过CL的鲁棒学习过程,POSTAL旨在在有序的标签序列上对原始变量进行鲁棒更新,即从“可靠”标签到“噪声”标签。为了实现POSTAL机制,我们设计了一组“筛选损失”,将所有标签从可靠区域排序到噪声区域。总之,使用筛选损失的POSTAL首先确保在可靠标签上对原始变量进行鲁棒更新,然后逐步在噪声标签上进行更新,直到收敛。在理论上,我们推导了通过筛选损失实现的POSTAL的收敛速度。同时,我们提供了代表性筛选损失的鲁棒性分析。在UCI1模拟数据集和亚马逊机械土耳其人众包数据集上的实验结果表明,使用筛选损失的POSTAL比几个现有的基线方法更有效、更鲁棒。1UCI是加利福尼亚大学欧文分校的缩写。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验