Centre National de la Recherche Scientifique & University of Brest, Laboratoire Géosciences Océan, Institut Universitaire Européen de la Mer, 29280 Plouzané, France.
Ocean Acoustics Lab, Alfred-Wegener-Institut, Helmholtz-Zentrum fur Polar- und Meeresforschung, 27570 Bremerhaven, Germany.
J Acoust Soc Am. 2018 Aug;144(2):740. doi: 10.1121/1.5049803.
Evaluation of the performance of computer-based algorithms to automatically detect mammalian vocalizations often relies on comparisons between detector outputs and a reference data set, generally obtained by manual annotation of acoustic recordings. To explore the reproducibility of these annotations, inter- and intra-analyst variability in manually annotated Antarctic blue whale (ABW) Z-calls are investigated by two analysts in acoustic data from two ocean basins representing different scenarios in terms of call abundance and background noise. Manual annotations exhibit strong inter- and intra-analyst variability, with less than 50% agreement between analysts. This variability is mainly caused by the difficulty of reliably and reproducibly distinguishing single calls in an ABW chorus made of overlaying distant calls. Furthermore, the performance of two automated detectors, based on spectrogram correlation or subspace-detection strategy, is evaluated by comparing detector output to a "conservative" manually annotated reference data set, which comprises only analysts' matching events. This study highlights the need for a standardized approach for human annotations and automatic detections, including a quantitative description of their performance, to improve the comparability of acoustic data, which is particularly relevant in the context of collaborative approaches in collecting and analyzing large passive acoustic data sets.
评估基于计算机的算法自动检测哺乳动物发声的性能通常依赖于检测器输出与参考数据集之间的比较,该数据集通常通过对声记录的手动注释获得。为了探索这些注释的可重复性,通过两位分析人员对来自两个大洋的声学数据进行研究,调查了手动注释的南极蓝鲸(ABW)Z 叫声的分析员内和分析员间的变异性,这些数据在叫声丰度和背景噪声方面代表了不同的情况。手动注释表现出很强的分析员内和分析员间的变异性,分析员之间的一致性不到 50%。这种变异性主要是由于难以可靠且可重复地区分 ABW 合唱中重叠的远距离叫声中的单个叫声。此外,通过将检测器输出与仅包含分析员匹配事件的“保守”手动注释参考数据集进行比较,评估了基于声谱图相关或子空间检测策略的两种自动检测器的性能。这项研究强调了需要一种标准化的人工注释和自动检测方法,包括对其性能的定量描述,以提高声学数据的可比性,这在收集和分析大型被动声学数据集的协作方法的背景下尤为重要。