Suppr超能文献

BioCPPNet:基于深度神经网络的生物声学源分离

BioCPPNet: automatic bioacoustic source separation with deep neural networks.

机构信息

Earth Species Project, Berkeley, CA, 94709, USA.

出版信息

Sci Rep. 2021 Dec 6;11(1):23502. doi: 10.1038/s41598-021-02790-2.

Abstract

We introduce the Bioacoustic Cocktail Party Problem Network (BioCPPNet), a lightweight, modular, and robust U-Net-based machine learning architecture optimized for bioacoustic source separation across diverse biological taxa. Employing learnable or handcrafted encoders, BioCPPNet operates directly on the raw acoustic mixture waveform containing overlapping vocalizations and separates the input waveform into estimates corresponding to the sources in the mixture. Predictions are compared to the reference ground truth waveforms by searching over the space of (output, target) source order permutations, and we train using an objective function motivated by perceptual audio quality. We apply BioCPPNet to several species with unique vocal behavior, including macaques, bottlenose dolphins, and Egyptian fruit bats, and we evaluate reconstruction quality of separated waveforms using the scale-invariant signal-to-distortion ratio (SI-SDR) and downstream identity classification accuracy. We consider mixtures with two or three concurrent conspecific vocalizers, and we examine separation performance in open and closed speaker scenarios. To our knowledge, this paper redefines the state-of-the-art in end-to-end single-channel bioacoustic source separation in a permutation-invariant regime across a heterogeneous set of non-human species. This study serves as a major step toward the deployment of bioacoustic source separation systems for processing substantial volumes of previously unusable data containing overlapping bioacoustic signals.

摘要

我们介绍了 Bioacoustic Cocktail Party Problem Network(BioCPPNet),这是一种轻量级、模块化且稳健的基于 U-Net 的机器学习架构,针对不同生物分类群的生物声学源分离进行了优化。BioCPPNet 采用可学习或手工制作的编码器,直接对包含重叠发声的原始声混合波形进行操作,并将输入波形分离成与混合中的源相对应的估计值。通过在(输出、目标)源顺序排列的空间上进行搜索,将预测结果与参考地面真实波形进行比较,我们使用受感知音频质量启发的目标函数进行训练。我们将 BioCPPNet 应用于具有独特发声行为的多个物种,包括猕猴、宽吻海豚和埃及果蝠,并使用无标度信号失真比(SI-SDR)和下游身份分类准确性来评估分离波形的重建质量。我们考虑了具有两个或三个并发同种发声者的混合物,并在开放和封闭扬声器场景中检查了分离性能。据我们所知,本文在非人类物种的异构集合中,以不变的排列方式重新定义了端到端单声道生物声学源分离的最新技术水平。这项研究是朝着部署生物声学源分离系统以处理以前无法使用的包含重叠生物声学信号的大量数据迈出的重要一步。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/655a/8648737/51d27b1e4e53/41598_2021_2790_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验