Kaewtip Kantapon, Alwan Abeer, O'Reilly Colm, Taylor Charles E
Department of Electrical Engineering, University of California, Los Angeles, 56-125B Engineering IV Building, Box 951594, Los Angeles, California 90095, USA.
Sigmedia, Department of Electronic and Electrical Engineering, Trinity College, Dublin, Ireland.
J Acoust Soc Am. 2016 Nov;140(5):3691. doi: 10.1121/1.4966592.
Automatic phrase detection systems of bird sounds are useful in several applications as they reduce the need for manual annotations. However, birdphrase detection is challenging due to limited training data and background noise. Limited data occur because of limited recordings or the existence of rare phrases. Background noise interference occurs because of the intrinsic nature of the recording environment such as wind or other animals. This paper presents a different approach to birdsong phrase classification using template-based techniques suitable even for limited training data and noisy environments. The algorithm utilizes dynamic time-warping (DTW) and prominent (high-energy) time-frequency regions of training spectrograms to derive templates. The performance of the proposed algorithm is compared with the traditional DTW and hidden Markov models (HMMs) methods under several training and test conditions. DTW works well when the data are limited, while HMMs do better when more data are available, yet they both suffer when the background noise is severe. The proposed algorithm outperforms DTW and HMMs in most training and testing conditions, usually with a high margin when the background noise level is high. The innovation of this work is that the proposed algorithm is robust to both limited training data and background noise.
鸟类声音的自动短语检测系统在多个应用中很有用,因为它们减少了人工标注的需求。然而,由于训练数据有限和存在背景噪声,鸟类短语检测具有挑战性。数据有限是由于录音有限或存在罕见短语。背景噪声干扰是由于录音环境的固有性质,如风或其他动物。本文提出了一种不同的方法,使用基于模板的技术进行鸟鸣短语分类,该技术甚至适用于有限的训练数据和嘈杂的环境。该算法利用动态时间规整(DTW)和训练频谱图的突出(高能量)时频区域来推导模板。在几种训练和测试条件下,将所提出算法的性能与传统的DTW和隐马尔可夫模型(HMM)方法进行了比较。当数据有限时,DTW效果良好,而当有更多数据可用时,HMM表现更好,但当背景噪声严重时,它们都会受到影响。在所提出的算法在大多数训练和测试条件下都优于DTW和HMM,当背景噪声水平较高时,通常优势明显。这项工作的创新之处在于,所提出的算法对有限的训练数据和背景噪声都具有鲁棒性。