质心估计与保证效率：弱监督学习的通用框架。

Centroid Estimation With Guaranteed Efficiency: A General Framework for Weakly Supervised Learning.

出版信息

IEEE Trans Pattern Anal Mach Intell. 2022 Jun;44(6):2841-2855. doi: 10.1109/TPAMI.2020.3044997. Epub 2022 May 5.

DOI:10.1109/TPAMI.2020.3044997

PMID:33320809

Abstract

In this paper, we propose a general framework termed centroid estimation with guaranteed efficiency (CEGE) for weakly supervised learning (WSL) with incomplete, inexact, and inaccurate supervision. The core of our framework is to devise an unbiased and statistically efficient risk estimator that is applicable to various weak supervision. Specifically, by decomposing the loss function (e.g., the squared loss and hinge loss) into a label-independent term and a label-dependent term, we discover that only the latter is influenced by the weak supervision and is related to the centroid of the entire dataset. Therefore, by constructing two auxiliary pseudo-labeled datasets with synthesized labels, we derive unbiased estimates of centroid based on the two auxiliary datasets, respectively. These two estimates are further linearly combined with a properly decided coefficient which makes the final combined estimate not only unbiased but also statistically efficient. This is better than some existing methods that only care about the unbiasedness of estimation but ignore the statistical efficiency. The good statistical efficiency of the derived estimator is guaranteed as we theoretically prove that it acquires the minimum variance when estimating the centroid. As a result, intensive experimental results on a large number of benchmark datasets demonstrate that our CEGE generally obtains better performance than the existing approaches related to typical WSL problems including semi-supervised learning, positive-unlabeled learning, multiple instance learning, and label noise learning.

摘要

在本文中，我们提出了一个名为“具有保证效率的质心估计”（Centroid Estimation with Guaranteed Efficiency，CEGE）的通用框架，用于处理不完全、不精确和不准确监督的弱监督学习（Weakly Supervised Learning，WSL）。我们框架的核心是设计一个无偏且统计有效的风险估计器，适用于各种弱监督。具体来说，通过将损失函数（例如，平方损失和 hinge 损失）分解为标签独立项和标签依赖项，我们发现只有后者受弱监督影响，并且与整个数据集的质心有关。因此，通过构建两个带有合成标签的辅助伪标记数据集，我们分别从两个辅助数据集推导出质心的无偏估计量。然后，这两个估计值进一步通过一个适当的决策系数进行线性组合，使得最终的组合估计值不仅无偏，而且具有统计效率。这比一些仅关注估计无偏性但忽略统计效率的现有方法要好。推导的估计器具有良好的统计效率，因为我们从理论上证明了它在估计质心时具有最小方差。结果，在大量基准数据集上进行的大量实验结果表明，我们的 CEGE 通常比现有的与典型 WSL 问题相关的方法表现更好，包括半监督学习、正无标签学习、多实例学习和标签噪声学习。

相似文献

Centroid Estimation With Guaranteed Efficiency: A General Framework for Weakly Supervised Learning.质心估计与保证效率：弱监督学习的通用框架。

IEEE Trans Pattern Anal Mach Intell. 2022 Jun;44(6):2841-2855. doi: 10.1109/TPAMI.2020.3044997. Epub 2022 May 5.

Towards Safe Weakly Supervised Learning.迈向安全的弱监督学习。

IEEE Trans Pattern Anal Mach Intell. 2021 Jan;43(1):334-346. doi: 10.1109/TPAMI.2019.2922396. Epub 2020 Dec 4.

Weakly Semi-supervised phenotyping using Electronic Health records.基于电子健康记录的弱监督表型研究

J Biomed Inform. 2022 Oct;134:104175. doi: 10.1016/j.jbi.2022.104175. Epub 2022 Sep 5.

Learning With Proper Partial Labels.带适当部分标签学习。

Neural Comput. 2022 Dec 14;35(1):58-81. doi: 10.1162/neco_a_01554.

SeLa-MIL: Developing an instance-level classifier via weakly-supervised self-training for whole slide image classification.SeLa-MIL：通过弱监督自训练开发用于全幻灯片图像分类的实例级分类器。

Comput Methods Programs Biomed. 2025 Apr;261:108614. doi: 10.1016/j.cmpb.2025.108614. Epub 2025 Jan 27.

Local contrastive loss with pseudo-label based self-training for semi-supervised medical image segmentation.基于伪标签自训练的局部对比损失的半监督医学图像分割。

Med Image Anal. 2023 Jul;87:102792. doi: 10.1016/j.media.2023.102792. Epub 2023 Mar 11.

Deep semi-supervised multiple instance learning with self-correction for DME classification from OCT images.用于从光学相干断层扫描（OCT）图像中进行糖尿病性黄斑水肿（DME）分类的带自我校正的深度半监督多实例学习

Med Image Anal. 2023 Jan;83:102673. doi: 10.1016/j.media.2022.102673. Epub 2022 Oct 26.

Class-Wise Denoising for Robust Learning Under Label Noise.基于类别噪声的鲁棒学习的去噪。

IEEE Trans Pattern Anal Mach Intell. 2023 Mar;45(3):2835-2848. doi: 10.1109/TPAMI.2022.3178690. Epub 2023 Feb 3.

Loss Decomposition and Centroid Estimation for Positive and Unlabeled Learning.用于正例和无标签学习的损失分解与质心估计

IEEE Trans Pattern Anal Mach Intell. 2021 Mar;43(3):918-932. doi: 10.1109/TPAMI.2019.2941684. Epub 2021 Feb 4.

Weakly Supervised AUC Optimization: A Unified Partial AUC Approach.弱监督AUC优化：一种统一的部分AUC方法。

IEEE Trans Pattern Anal Mach Intell. 2024 Jul;46(7):4780-4795. doi: 10.1109/TPAMI.2024.3357814. Epub 2024 Jun 5.

引用本文的文献

A recent survey on instance-dependent positive and unlabeled learning.一项关于实例依赖型正例和无标签学习的近期调查。

Fundam Res. 2022 Oct 12;5(2):796-803. doi: 10.1016/j.fmre.2022.09.019. eCollection 2025 Mar.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验