Suppr超能文献

迁移学习方法和数据集特征对鸟鸣分类中泛化能力的影响。

Impact of transfer learning methods and dataset characteristics on generalization in birdsong classification.

作者信息

Ghani Burooj, Kalkman Vincent J, Planqué Bob, Vellinga Willem-Pier, Gill Lisa, Stowell Dan

机构信息

Naturalis Biodiversity Center, Leiden, The Netherlands.

Xeno-Canto Foundation, The Hague, The Netherlands.

出版信息

Sci Rep. 2025 May 9;15(1):16273. doi: 10.1038/s41598-025-00996-2.

Abstract

Animal sounds can be recognised automatically by machine learning, and this has an important role to play in biodiversity monitoring. Yet despite increasingly impressive capabilities, bioacoustic species classifiers still exhibit imbalanced performance across species and habitats, especially in complex soundscapes. In this study, we explore the effectiveness of transfer learning in large-scale bird sound classification across various conditions, including single- and multi-label scenarios, and across different model architectures such as CNNs and Transformers. Our experiments demonstrate that both finetuning and knowledge distillation yield strong performance, with cross-distillation proving particularly effective in improving in-domain performance on Xeno-canto data. However, when generalizing to soundscapes, shallow finetuning exhibits superior performance compared to knowledge distillation, highlighting its robustness and constrained nature. Our study further investigates how to use multi-species labels, in cases where these are present but incomplete. We advocate for more comprehensive labeling practices within the animal sound community, including annotating background species and providing temporal details, to enhance the training of robust bird sound classifiers. These findings provide insights into the optimal reuse of pretrained models for advancing automatic bioacoustic recognition.

摘要

动物声音可以通过机器学习自动识别,这在生物多样性监测中发挥着重要作用。然而,尽管生物声学物种分类器的能力越来越令人印象深刻,但在跨物种和栖息地时,尤其是在复杂的声景中,其性能仍表现出不平衡。在本研究中,我们探讨了迁移学习在各种条件下大规模鸟类声音分类中的有效性,这些条件包括单标签和多标签场景,以及不同的模型架构,如卷积神经网络(CNNs)和Transformer。我们的实验表明,微调(finetuning)和知识蒸馏都能产生强大的性能,交叉蒸馏在提高Xeno-canto数据的领域内性能方面尤其有效。然而,当推广到声景时,与知识蒸馏相比,浅层微调表现出更优的性能,凸显了其稳健性和局限性。我们的研究进一步探讨了在存在但不完整的多物种标签情况下如何使用它们。我们主张在动物声音领域采用更全面的标注方法,包括对背景物种进行标注并提供时间细节,以加强对稳健的鸟类声音分类器的训练。这些发现为推进自动生物声学识别的预训练模型的最佳重用提供了见解。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4377/12064815/bdef4908df81/41598_2025_996_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验