• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

超越多数投票:一种用于严重噪声标签的粗到细标签过滤方法。

Beyond Majority Voting: A Coarse-to-Fine Label Filtration for Heavily Noisy Labels.

出版信息

IEEE Trans Neural Netw Learn Syst. 2019 Dec;30(12):3774-3787. doi: 10.1109/TNNLS.2019.2899045. Epub 2019 Mar 15.

DOI:10.1109/TNNLS.2019.2899045
PMID:30892236
Abstract

Crowdsourcing has become the most appealing way to provide a plethora of labels at a low cost. Nevertheless, labels from amateur workers are often noisy, which inevitably degenerates the robustness of subsequent learning models. To improve the label quality for subsequent use, majority voting (MV) is widely leveraged to aggregate crowdsourced labels due to its simplicity and scalability. However, when crowdsourced labels are "heavily" noisy (e.g., 40% of noisy labels), MV may not work well because of the fact "garbage (heavily noisy labels) in, garbage (full aggregated labels) out." This issue inspires us to think: if the ultimate target is to learn a robust model using noisy labels, why not provide partial aggregated labels and ensure that these labels are reliable enough for learning models? To solve this challenge by improving MV, we propose a coarse-to-fine label filtration model called double filter machine (DFM), which consists of a (majority) voting filter and a sparse filter serially. Specifically, the DFM refines crowdsourced labels from coarse filtering to fine filtering. In the stage of coarse filtering, the DFM aggregates crowdsourced labels by voting filter, which yields (quality-acceptable) full aggregated labels. In the stage of fine filtering, DFM further digs out a set of high-quality labels from full aggregated labels by sparse filter, since this filter can identify high-quality labels by the methodology of support selection. Based on the insight of compressed sensing, DFM recovers a ground-truth signal from heavily noisy data under a restricted isometry property. To sum up, the primary benefits of DFM are to keep the scalability by voting filter, while improve the robustness by sparse filter. We also derive theoretical guarantees for the convergence and recovery of DFM and reveal its complexity. We conduct comprehensive experiments on both the UCI simulated and the AMT crowdsourced datasets. Empirical results show that partial aggregated labels provided by DFM effectively improve the robustness of learning models.

摘要

众包已经成为提供大量低成本标签的最吸引人的方式。然而,业余工人的标签往往是嘈杂的,这不可避免地降低了后续学习模型的健壮性。为了提高后续使用的标签质量,由于其简单性和可扩展性,多数投票(MV)被广泛用于聚合众包标签。然而,当众包标签“严重”嘈杂(例如,40%的嘈杂标签)时,MV 可能无法正常工作,因为“垃圾(嘈杂标签)进,垃圾(全聚合标签)出”。这个问题启发我们思考:如果最终目标是使用嘈杂标签学习稳健的模型,为什么不提供部分聚合标签,并确保这些标签足够可靠,以用于学习模型?为了解决通过改进 MV 来解决这个挑战,我们提出了一种称为双过滤机(DFM)的从粗到细的标签过滤模型,它由一个(多数)投票过滤器和一个稀疏过滤器串联组成。具体来说,DFM 从粗过滤到细过滤来细化众包标签。在粗过滤阶段,DFM 通过投票过滤器聚合众包标签,从而产生(质量可接受)的全聚合标签。在细过滤阶段,DFM 进一步从全聚合标签中挖掘出一组高质量标签,因为该过滤器可以通过支持选择的方法来识别高质量标签。基于压缩感知的洞察力,DFM 在受限等距属性下从严重嘈杂数据中恢复出真实信号。总之,DFM 的主要优势是通过投票过滤器保持可扩展性,同时通过稀疏过滤器提高健壮性。我们还为 DFM 的收敛性和恢复性提供了理论保证,并揭示了其复杂性。我们在 UCI 模拟和 AMT 众包数据集上进行了全面的实验。实验结果表明,DFM 提供的部分聚合标签有效地提高了学习模型的健壮性。

相似文献

1
Beyond Majority Voting: A Coarse-to-Fine Label Filtration for Heavily Noisy Labels.超越多数投票:一种用于严重噪声标签的粗到细标签过滤方法。
IEEE Trans Neural Netw Learn Syst. 2019 Dec;30(12):3774-3787. doi: 10.1109/TNNLS.2019.2899045. Epub 2019 Mar 15.
2
Progressive Stochastic Learning for Noisy Labels.针对噪声标签的渐进式随机学习
IEEE Trans Neural Netw Learn Syst. 2018 Oct;29(10):5136-5148. doi: 10.1109/TNNLS.2018.2792062. Epub 2018 Feb 5.
3
Improving Crowdsourced Label Quality Using Noise Correction.利用噪声校正提高众包标签质量。
IEEE Trans Neural Netw Learn Syst. 2018 May;29(5):1675-1688. doi: 10.1109/TNNLS.2017.2677468. Epub 2017 Mar 22.
4
Max-Margin Majority Voting for Learning from Crowds.基于最大间隔多数投票的众包学习方法
IEEE Trans Pattern Anal Mach Intell. 2019 Oct;41(10):2480-2494. doi: 10.1109/TPAMI.2018.2860987. Epub 2018 Jul 31.
5
Learning From Crowds With Multiple Noisy Label Distribution Propagation.基于多噪声标签分布传播的众包学习
IEEE Trans Neural Netw Learn Syst. 2022 Nov;33(11):6558-6568. doi: 10.1109/TNNLS.2021.3082496. Epub 2022 Oct 27.
6
CrowdTeacher: Robust Co-teaching with Noisy Answers and Sample-Specific Perturbations for Tabular Data.群体教师:针对表格数据,通过带噪声答案和特定样本扰动进行稳健协同教学
Adv Knowl Discov Data Min. 2021 May;12713:181-193. doi: 10.1007/978-3-030-75765-6_15. Epub 2021 May 8.
7
Domain-Weighted Majority Voting for Crowdsourcing.用于众包的领域加权多数投票
IEEE Trans Neural Netw Learn Syst. 2019 Jan;30(1):163-174. doi: 10.1109/TNNLS.2018.2836969. Epub 2018 Jun 5.
8
Partial Multi-Label Learning With Noisy Label Identification.基于噪声标签识别的部分多标签学习
IEEE Trans Pattern Anal Mach Intell. 2022 Jul;44(7):3676-3687. doi: 10.1109/TPAMI.2021.3059290. Epub 2022 Jun 3.
9
Deep Learning from Noisy Image Labels with Quality Embedding.基于质量嵌入从噪声图像标签中进行深度学习。
IEEE Trans Image Process. 2018 Oct 24. doi: 10.1109/TIP.2018.2877939.
10
Crowdsourced Label Aggregation Using Bilayer Collaborative Clustering.基于双层协同聚类的众包标签聚合。
IEEE Trans Neural Netw Learn Syst. 2019 Oct;30(10):3172-3185. doi: 10.1109/TNNLS.2018.2890148. Epub 2019 Jan 25.

引用本文的文献

1
Incorporating label uncertainty during the training of convolutional neural networks improves performance for the discrimination between certain and inconclusive cases in dopamine transporter SPECT.在卷积神经网络训练过程中纳入标签不确定性可提高多巴胺转运体单光子发射计算机断层扫描中确定病例与不确定病例之间鉴别的性能。
Eur J Nucl Med Mol Imaging. 2025 Mar;52(4):1535-1548. doi: 10.1007/s00259-024-06988-0. Epub 2024 Nov 27.