Viveros-Muñoz Rhoddy, Huijse Pablo, Vargas Victor, Espejo Diego, Poblete Victor, Arenas Jorge P, Vernier Matthieu, Vergara Diego, Suárez Enrique
Instituto de Acústica, Universidad Austral de Chile, General Lagos 2086, Valdivia, Chile.
Instituto de Informática, Universidad Austral de Chile, General Lagos 2086, Valdivia, Chile.
Data Brief. 2023 Sep 7;50:109552. doi: 10.1016/j.dib.2023.109552. eCollection 2023 Oct.
This paper presents the Synthetic Polyphonic Ambient Sound Source (SPASS) dataset, a publicly available synthetic polyphonic audio dataset. SPASS was designed to train deep neural networks effectively for polyphonic sound event detection (PSED) in urban soundscapes. SPASS contains synthetic recordings from five virtual environments: park, square, street, market, and waterfront. The data collection process consisted of the curation of different monophonic sound sources following a hierarchical class taxonomy, the configuration of the virtual environments with the RAVEN software library, the generation of all stimuli, and the processing of this data to create synthetic recordings of polyphonic sound events with their associated metadata. The dataset contains 5000 audio clips per environment, i.e., 25,000 stimuli of 10 s each, virtually recorded at a sampling rate of 44.1 kHz. This effort is part of the project ``Integrated System for the Analysis of Environmental Sound Sources: FuSA System'' in the city of Valdivia, Chile, which aims to develop a system for detecting and classifying environmental sound sources through deep Artificial Neural Network (ANN) models.
本文介绍了合成复调环境声源(SPASS)数据集,这是一个可公开获取的合成复调音频数据集。SPASS旨在有效训练深度神经网络,用于城市声景中的复调声音事件检测(PSED)。SPASS包含来自五个虚拟环境的合成录音:公园、广场、街道、市场和滨水区。数据收集过程包括按照分层类别分类法整理不同的单声道声源、使用RAVEN软件库配置虚拟环境、生成所有刺激以及处理这些数据以创建带有相关元数据的复调声音事件的合成录音。每个环境的数据集包含5000个音频片段,即25000个时长为10秒的刺激,以44.1kHz的采样率进行虚拟录制。这项工作是智利瓦尔迪维亚市“环境声源分析集成系统:FuSA系统”项目的一部分,该项目旨在开发一个通过深度人工神经网络(ANN)模型检测和分类环境声源的系统。