用于自动驾驶车辆嵌入式系统中环境声音识别的数据集。

A dataset for environmental sound recognition in embedded systems for autonomous vehicles.

作者信息

Florentino André Luiz, Diniz Eva Laussac, Aquino-Jr Plinio Thomaz

机构信息

Centro Universitário FEI - Fundação Educacional Inaciana Pe. Saboia de Medeiros, Electric Engineering, São Bernardo do Campo, 09850-901, Brazil.

UTFPR - Universidade Tecnológica Federal do Paraná, Computer Science, Cornélio Procópio, 86300-000, Brazil.

出版信息

Sci Data. 2025 Jul 5;12(1):1148. doi: 10.1038/s41597-025-05446-2.

DOI:10.1038/s41597-025-05446-2

PMID:40617818

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12228817/

Abstract

Environmental sound recognition might play a crucial role in the development of autonomous vehicles by mimicking human behavior, particularly in complementing sight and touch to create a comprehensive sensory system. Just as humans rely on auditory cues to detect and respond to critical events such as emergency sirens, honking horns, or the approach of other vehicles and pedestrians, autonomous vehicles equipped with advanced sound recognition capabilities may significantly enhance their situational awareness and decision-making processes. To promote this approach, we extended the UrbanSound8K (US8K) dataset, a benchmark in urban sound classification research, by merging some classes deemed irrelevant for autonomous vehicles into a new class named 'background' and adding the class 'silence' sourced from Freesound.org to complement the dataset. This tailored dataset, named UrbanSound8K for Autonomous Vehicles (US8K_AV), contains 4.94 hours of annotated audio samples with 4,908 WAV files distributed among 6 classes. It supports the development of predictive models that can be deployed in embedded systems like Raspberry Pi.

摘要

环境声音识别通过模仿人类行为，可能在自动驾驶汽车的发展中发挥关键作用，特别是在补充视觉和触觉以创建一个全面的传感系统方面。正如人类依靠听觉线索来检测关键事件并做出反应，如紧急警报声、喇叭声或其他车辆和行人的靠近一样，配备先进声音识别能力的自动驾驶汽车可能会显著提高其态势感知和决策过程。为了推广这种方法，我们扩展了UrbanSound8K（US8K）数据集，这是城市声音分类研究中的一个基准，通过将一些被认为与自动驾驶汽车无关的类别合并为一个名为“背景”的新类别，并添加了来自Freesound.org的“静音”类别来补充数据集。这个经过定制的数据集，名为自动驾驶汽车的UrbanSound8K（US8K_AV），包含4.94小时的带注释音频样本以及4908个WAV文件，分布在6个类别中。它支持开发可部署在树莓派等嵌入式系统中的预测模型。