PROTAX-Sound：一种用于动物声音自动识别的概率框架。

PROTAX-Sound: A probabilistic framework for automated animal sound identification.

作者信息

de Camargo Ulisses Moliterno, Somervuo Panu, Ovaskainen Otso

机构信息

Department of Biosciences, University of Helsinki, Helsinki, Finland.

Department of Biology, Norwegian University of Science and Technology, Trondheim, Norway.

出版信息

PLoS One. 2017 Sep 1;12(9):e0184048. doi: 10.1371/journal.pone.0184048. eCollection 2017.

DOI:10.1371/journal.pone.0184048

PMID:28863178

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5581177/

Abstract

Autonomous audio recording is stimulating new field in bioacoustics, with a great promise for conducting cost-effective species surveys. One major current challenge is the lack of reliable classifiers capable of multi-species identification. We present PROTAX-Sound, a statistical framework to perform probabilistic classification of animal sounds. PROTAX-Sound is based on a multinomial regression model, and it can utilize as predictors any kind of sound features or classifications produced by other existing algorithms. PROTAX-Sound combines audio and image processing techniques to scan environmental audio files. It identifies regions of interest (a segment of the audio file that contains a vocalization to be classified), extracts acoustic features from them and compares with samples in a reference database. The output of PROTAX-Sound is the probabilistic classification of each vocalization, including the possibility that it represents species not present in the reference database. We demonstrate the performance of PROTAX-Sound by classifying audio from a species-rich case study of tropical birds. The best performing classifier achieved 68% classification accuracy for 200 bird species. PROTAX-Sound improves the classification power of current techniques by combining information from multiple classifiers in a manner that yields calibrated classification probabilities.

摘要

自主音频录制正在刺激生物声学领域的新发展，有望进行具有成本效益的物种调查。当前的一个主要挑战是缺乏能够进行多物种识别的可靠分类器。我们提出了PROTAX-Sound，这是一个用于对动物声音进行概率分类的统计框架。PROTAX-Sound基于多项回归模型，它可以将其他现有算法产生的任何类型的声音特征或分类用作预测变量。PROTAX-Sound结合了音频和图像处理技术来扫描环境音频文件。它识别感兴趣的区域（音频文件中包含要分类的发声的一段），从中提取声学特征并与参考数据库中的样本进行比较。PROTAX-Sound的输出是每个发声的概率分类，包括它代表参考数据库中不存在的物种的可能性。我们通过对一个物种丰富的热带鸟类案例研究的音频进行分类来展示PROTAX-Sound的性能。性能最佳的分类器对200种鸟类的分类准确率达到了68%。PROTAX-Sound通过以产生校准分类概率的方式组合来自多个分类器的信息，提高了当前技术的分类能力。