• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

通过查询有判别力和代表性的样本并充分利用未标记数据实现高效主动学习。

Efficient Active Learning by Querying Discriminative and Representative Samples and Fully Exploiting Unlabeled Data.

作者信息

Gu Bin, Zhai Zhou, Deng Cheng, Huang Heng

出版信息

IEEE Trans Neural Netw Learn Syst. 2021 Sep;32(9):4111-4122. doi: 10.1109/TNNLS.2020.3016928. Epub 2021 Aug 31.

DOI:10.1109/TNNLS.2020.3016928
PMID:32845848
Abstract

Active learning is an important learning paradigm in machine learning and data mining, which aims to train effective classifiers with as few labeled samples as possible. Querying discriminative (informative) and representative samples are the state-of-the-art approach for active learning. Fully utilizing a large amount of unlabeled data provides a second chance to improve the performance of active learning. Although there have been several active learning methods proposed by combining with semisupervised learning, fast active learning with fully exploiting unlabeled data and querying discriminative and representative samples is still an open question. To overcome this challenging issue, in this article, we propose a new efficient batch mode active learning algorithm. Specifically, we first provide an active learning risk bound by fully considering the unlabeled samples in characterizing the informativeness and representativeness. Based on the risk bound, we derive a new objective function for batch mode active learning. After that, we propose a wrapper algorithm to solve the objective function, which essentially trains a semisupervised classifier and selects discriminative and representative samples alternately. Especially, to avoid retraining the semisupervised classifier from scratch after each query, we design two unique procedures based on the path-following technique, which can remove multiple queried samples from the unlabeled data set and add the queried samples into the labeled data set efficiently. Extensive experimental results on a variety of benchmark data sets not only show that our algorithm has a better generalization performance than the state-of-the-art active learning approaches but also show its significant efficiency.

摘要

主动学习是机器学习和数据挖掘中的一种重要学习范式,旨在用尽可能少的标记样本训练有效的分类器。查询有判别力(信息丰富)和代表性的样本是主动学习的最新方法。充分利用大量未标记数据为提高主动学习性能提供了第二次机会。尽管已经提出了几种结合半监督学习的主动学习方法,但如何在充分利用未标记数据并查询有判别力和代表性样本的情况下实现快速主动学习仍然是一个悬而未决的问题。为了克服这一具有挑战性的问题,在本文中,我们提出了一种新的高效批处理模式主动学习算法。具体来说,我们首先通过在表征信息性和代表性时充分考虑未标记样本,给出了一个主动学习风险界。基于该风险界,我们推导出了批处理模式主动学习的一个新目标函数。之后,我们提出了一种包装算法来求解该目标函数,该算法本质上是交替训练一个半监督分类器并选择有判别力和代表性的样本。特别是,为了避免每次查询后从头重新训练半监督分类器,我们基于路径跟踪技术设计了两个独特的过程,它们可以从未标记数据集中有效地移除多个查询样本,并将查询样本添加到标记数据集中。在各种基准数据集上的大量实验结果不仅表明我们的算法比现有主动学习方法具有更好的泛化性能,而且还显示了其显著的效率。

相似文献

1
Efficient Active Learning by Querying Discriminative and Representative Samples and Fully Exploiting Unlabeled Data.通过查询有判别力和代表性的样本并充分利用未标记数据实现高效主动学习。
IEEE Trans Neural Netw Learn Syst. 2021 Sep;32(9):4111-4122. doi: 10.1109/TNNLS.2020.3016928. Epub 2021 Aug 31.
2
Batch Mode Active Sampling based on Marginal Probability Distribution Matching.基于边际概率分布匹配的批处理模式主动采样
KDD. 2012;2012:741-749. doi: 10.1145/2339530.2339647.
3
Active Learning by Querying Informative and Representative Examples.主动学习通过查询信息丰富且具有代表性的示例。
IEEE Trans Pattern Anal Mach Intell. 2014 Oct;36(10):1936-49. doi: 10.1109/TPAMI.2014.2307881.
4
Robust and Discriminative Labeling for Multi-Label Active Learning Based on Maximum Correntropy Criterion.基于最大相关熵准则的多标签主动学习的鲁棒和判别式标注。
IEEE Trans Image Process. 2017 Apr;26(4):1694-1707. doi: 10.1109/TIP.2017.2651372. Epub 2017 Jan 10.
5
Double-Criteria Active Learning for Multiclass Brain-Computer Interfaces.基于双准则的多类脑机接口的主动学习方法。
Comput Intell Neurosci. 2020 Mar 10;2020:3287589. doi: 10.1155/2020/3287589. eCollection 2020.
6
An active learning approach with uncertainty, representativeness, and diversity.一种具有不确定性、代表性和多样性的主动学习方法。
ScientificWorldJournal. 2014;2014:827586. doi: 10.1155/2014/827586. Epub 2014 Aug 11.
7
Fast and Effective Active Clustering Ensemble Based on Density Peak.基于密度峰值的快速有效主动聚类集成
IEEE Trans Neural Netw Learn Syst. 2021 Aug;32(8):3593-3607. doi: 10.1109/TNNLS.2020.3015795. Epub 2021 Aug 3.
8
Online Semisupervised Active Classification for Multiview PolSAR Data.多视角极化合成孔径雷达数据的在线半监督主动分类
IEEE Trans Cybern. 2022 Jun;52(6):4415-4429. doi: 10.1109/TCYB.2020.3026741. Epub 2022 Jun 16.
9
Semisupervised learning for a hybrid generative/discriminative classifier based on the maximum entropy principle.基于最大熵原理的混合生成/判别式分类器的半监督学习
IEEE Trans Pattern Anal Mach Intell. 2008 Mar;30(3):424-37. doi: 10.1109/TPAMI.2007.70710.
10
Exploring Representativeness and Informativeness for Active Learning.探索主动学习的代表性和信息量。
IEEE Trans Cybern. 2017 Jan;47(1):14-26. doi: 10.1109/TCYB.2015.2496974. Epub 2015 Nov 17.

引用本文的文献

1
Cost-Effective Multitask Active Learning in Wearable Sensor Systems.可穿戴传感器系统中的经济高效多任务主动学习
Sensors (Basel). 2025 Feb 28;25(5):1522. doi: 10.3390/s25051522.