Mey Alexander, Loog Marco
IEEE Trans Pattern Anal Mach Intell. 2023 Apr;45(4):4747-4767. doi: 10.1109/TPAMI.2022.3198175. Epub 2023 Mar 7.
Semi-supervised learning is the learning setting in which we have both labeled and unlabeled data at our disposal. This survey covers theoretical results for this setting and maps out the benefits of unlabeled data in classification and regression tasks. Most methods that use unlabeled data rely on certain assumptions about the data distribution. When those assumptions are not met, including unlabeled data may actually decrease performance. For all practical purposes, it is therefore instructive to have an understanding of the underlying theory and the possible learning behavior that comes with it. This survey gathers results about the possible gains one can achieve when using semi-supervised learning as well as results about the limits of such methods. Specifically, it aims to answer the following questions: what are, in terms of improving supervised methods, the limits of semi-supervised learning? What are the assumptions of different methods? What can we achieve if the assumptions are true? As, indeed, the precise assumptions made are of the essence, this is where the survey's particular attention goes out to.
半监督学习是一种学习设置,在这种设置中我们可以使用有标签和无标签的数据。本综述涵盖了该设置下的理论成果,并阐述了无标签数据在分类和回归任务中的优势。大多数使用无标签数据的方法都依赖于关于数据分布的某些假设。当这些假设不成立时,包含无标签数据实际上可能会降低性能。因此,出于所有实际目的,了解其基础理论以及随之而来的可能的学习行为是很有指导意义的。本综述收集了关于使用半监督学习可能取得的收益以及此类方法局限性的结果。具体而言,它旨在回答以下问题:就改进监督方法而言,半监督学习的局限性是什么?不同方法的假设是什么?如果假设成立我们能取得什么成果?事实上,所做的精确假设至关重要,这也是本综述特别关注的地方。