Thakur Anshul, Thapar Daksh, Rajan Padmanabhan, Nigam Aditya
School of Computing and Electrical Engineering, IIT Mandi, Mandi, Himachal Pradesh-175005, India.
J Acoust Soc Am. 2019 Jul;146(1):534. doi: 10.1121/1.5118245.
Bioacoustic classification often suffers from the lack of labeled data. This hinders the effective utilization of state-of-the-art deep learning models in bioacoustics. To overcome this problem, the authors propose a deep metric learning-based framework that provides effective classification, even when only a small number of per-class training examples are available. The proposed framework utilizes a multiscale convolutional neural network and the proposed dynamic variant of the triplet loss to learn a transformation space where intra-class separation is minimized and inter-class separation is maximized by a dynamically increasing margin. The process of learning this transformation is known as deep metric learning. The triplet loss analyzes three examples (referred to as a triplet) at a time to perform deep metric learning. The number of possible triplets increases cubically with the dataset size, making triplet loss more suitable than the cross-entropy loss in data-scarce conditions. Experiments on three different publicly available datasets show that the proposed framework performs better than existing bioacoustic classification methods. Experimental results also demonstrate the superiority of dynamic triplet loss over cross-entropy loss in data-scarce conditions. Furthermore, unlike existing bioacoustic classification methods, the proposed framework has been extended to provide open-set classification.
生物声学分类常常因缺乏标注数据而受到影响。这阻碍了最先进的深度学习模型在生物声学中的有效应用。为克服这一问题,作者提出了一种基于深度度量学习的框架,即使在每类仅有少量训练示例的情况下,该框架也能提供有效的分类。所提出的框架利用多尺度卷积神经网络以及所提出的三元组损失的动态变体来学习一个变换空间,在这个空间中,通过动态增加边界,类内间距最小化,类间间距最大化。学习这种变换的过程称为深度度量学习。三元组损失一次分析三个示例(称为一个三元组)来执行深度度量学习。可能的三元组数量随数据集大小呈立方增长,这使得三元组损失在数据稀缺的情况下比交叉熵损失更适用。在三个不同的公开可用数据集上进行的实验表明,所提出的框架比现有的生物声学分类方法表现更好。实验结果还证明了在数据稀缺的情况下,动态三元组损失优于交叉熵损失。此外,与现有的生物声学分类方法不同,所提出的框架已扩展为提供开放集分类。