Visualfy, 46181 Benisanó, Spain.
Computer Science Department, Universitat de València, 46100 Burjassot, Spain.
Sensors (Basel). 2020 Jul 3;20(13):3741. doi: 10.3390/s20133741.
Open-set recognition (OSR) is a challenging machine learning problem that appears when classifiers are faced with test instances from classes not seen during training. It can be summarized as the problem of correctly identifying instances from a known class (seen during training) while rejecting any unknown or unwanted samples (those belonging to unseen classes). Another problem arising in practical scenarios is few-shot learning (FSL), which appears when there is no availability of a large number of positive samples for training a recognition system. Taking these two limitations into account, a new dataset for OSR and FSL for audio data was recently released to promote research on solutions aimed at addressing both limitations. This paper proposes an audio OSR/FSL system divided into three steps: a high-level audio representation, feature embedding using two different autoencoder architectures and a multi-layer perceptron (MLP) trained on latent space representations to detect known classes and reject unwanted ones. An extensive set of experiments is carried out considering multiple combinations of openness factors (OSR condition) and number of shots (FSL condition), showing the validity of the proposed approach and confirming superior performance with respect to a baseline system based on transfer learning.
开集识别(OSR)是机器学习中的一个具有挑战性的问题,它出现在分类器面临训练中未见过的测试实例时。可以将其概括为正确识别来自已知类(在训练中见过)的实例,同时拒绝任何未知或不需要的样本(属于未见过的类)的问题。另一个在实际场景中出现的问题是小样本学习(FSL),当没有大量正样本可用于训练识别系统时,就会出现这种情况。考虑到这两个限制,最近发布了一个用于音频数据的 OSR 和 FSL 的新数据集,以促进针对解决这两个限制的解决方案的研究。本文提出了一个音频 OSR/FSL 系统,分为三个步骤:高级音频表示、使用两种不同自动编码器架构的特征嵌入以及在潜在空间表示上训练的多层感知机(MLP),用于检测已知类并拒绝不需要的类。考虑到开放性因素(OSR 条件)和样本数量(FSL 条件)的多种组合进行了广泛的实验,验证了所提出方法的有效性,并证实了相对于基于迁移学习的基线系统的优越性能。