• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

HornBase:一个包含不同场景和位置汽车喇叭声的音频数据集。

HornBase: An audio dataset of car horns in different scenarios and positions.

作者信息

Dim Cleyton Aparecido, Neto Nelson Cruz Sampaio, de Morais Jefferson Magalhães

机构信息

Federal University of Para - Institute of Exact and Natural Sciences, Rua Augusto Corrêa, 01 - Campus Universitário do Guamá - Belém, Pará, 66.075-110, Brazil.

出版信息

Data Brief. 2024 Jul 14;55:110678. doi: 10.1016/j.dib.2024.110678. eCollection 2024 Aug.

DOI:10.1016/j.dib.2024.110678
PMID:39100781
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11295613/
Abstract

In recent years, there has been significant growth in the development of Machine Learning (ML) models across various fields, such as image and sound recognition and natural language processing. They need to be trained with a large enough data set, ensuring predictions or results are as accurate as possible. When it comes to models for audio recognition, specifically the detection of car horns, the datasets are generally not built considering the specificities of the different scenarios that may exist in real traffic, being limited to collections of random horns, whose sources are sometimes collected from audio streaming sites. There are benefits associated with a ML model trained on data tailored for horn detection. One notable advantage is the potential implementation of horn detection in smartphones and smartwatches equipped with embedded models to aid hearing-impaired individuals while driving and alert them in potentially hazardous situations, thus promoting social inclusion. Given these considerations, we developed a dataset specifically for car horns. This dataset has 1,080 one-second-long .wav audio files categorized into two classes: horn and not horn. The data collection followed a carefully established protocol designed to encompass different scenarios in a real traffic environment, considering diverse relative positions between the involved vehicles. The protocol defines ten distinct scenarios, incorporating variables within the car receiving the horn, including the presence of internal conversations, music, open or closed windows, engine status (on or off), and whether the car is stationary or in motion. Additionally, there are variations in scenarios associated with the vehicle emitting the horn, such as its relative position-behind, alongside, or in front of the receiving vehicle-and the types of horns used, which may include a short honk, a prolonged one, or a rhythmic pattern of three quick honks. The data collection process started with simultaneous audio recordings on two smartphones positioned inside the receiving vehicle, capturing all scenarios in a single audio file on each device. A 400-meter route was defined in a controlled area, so the audio recordings could be carried out safely. For each established scenario, the route was covered with emissions of different types of horns in distinct positions between the vehicles, and then the route was restarted in the next scenario. After the collection phase, the data preprocessing involved manually cutting each horn sound in multiple one-second windowing profiles, saving them in PCM stereo .wav files with a 16-bit depth and a 44.1 kHz sampling rate. For each horn clipping, a corresponding non-horn clipping in close proximity was performed, ensuring a balanced model. This dataset was designed for utilization in various machine learning algorithms, whether for detecting horns with the binary labels, or classifying different patterns of horns by rearranging labels considering the file nomenclature. In technical validation, classifications were performed using a convolutional neural network trained with spectrograms from the dataset's audio, achieving an average accuracy of 89% across 100 trained models.

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/79ef/11295613/4060b88d3d7e/gr9.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/79ef/11295613/e968c7b4b7a1/gr1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/79ef/11295613/89e0068a6ecf/gr2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/79ef/11295613/d7647313e6f7/gr3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/79ef/11295613/f42dab205dc0/gr4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/79ef/11295613/fb3efb519312/gr7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/79ef/11295613/6e579641ce7e/gr5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/79ef/11295613/3b6312c4a25c/gr8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/79ef/11295613/18d159fe1b3f/gr6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/79ef/11295613/4060b88d3d7e/gr9.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/79ef/11295613/e968c7b4b7a1/gr1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/79ef/11295613/89e0068a6ecf/gr2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/79ef/11295613/d7647313e6f7/gr3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/79ef/11295613/f42dab205dc0/gr4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/79ef/11295613/fb3efb519312/gr7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/79ef/11295613/6e579641ce7e/gr5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/79ef/11295613/3b6312c4a25c/gr8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/79ef/11295613/18d159fe1b3f/gr6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/79ef/11295613/4060b88d3d7e/gr9.jpg
摘要

近年来,机器学习(ML)模型在各个领域都有了显著发展,如图像和声音识别以及自然语言处理。这些模型需要用足够大的数据集进行训练,以确保预测或结果尽可能准确。在音频识别模型方面,特别是汽车喇叭检测,数据集通常没有考虑到实际交通中可能存在的不同场景的特殊性,仅限于随机喇叭的收集,其来源有时是从音频流网站收集的。使用专门为喇叭检测量身定制的数据训练的ML模型有诸多好处。一个显著的优势是,有可能在配备嵌入式模型的智能手机和智能手表中实现喇叭检测,以帮助听力受损者在驾驶时,并在潜在危险情况下提醒他们,从而促进社会包容。考虑到这些因素,我们专门开发了一个汽车喇叭数据集。这个数据集有1080个一秒长的.wav音频文件,分为两类:喇叭和非喇叭。数据收集遵循了精心制定的协议,旨在涵盖真实交通环境中的不同场景,考虑到相关车辆之间的各种相对位置。该协议定义了十种不同的场景,纳入了接收喇叭的汽车内部的变量,包括内部对话、音乐的存在、窗户打开或关闭、发动机状态(开或关)以及汽车是静止还是行驶。此外,与发出喇叭的车辆相关的场景也有变化,例如其相对位置——在接收车辆后面、旁边或前面——以及使用的喇叭类型,可能包括短按喇叭、长按喇叭或三声快速喇叭的节奏模式。数据收集过程首先在位于接收车辆内的两部智能手机上同时进行音频录制,在每个设备上的单个音频文件中捕捉所有场景。在一个受控区域定义了一条400米的路线,以便安全地进行音频录制。对于每个既定场景,在车辆之间的不同位置用不同类型的喇叭发出声音覆盖该路线,然后在下一个场景中重新开始该路线。在收集阶段之后,数据预处理包括在多个一秒的窗口配置文件中手动切割每个喇叭声音,将它们保存为16位深度和44.1kHz采样率的PCM立体声.wav文件。对于每个喇叭剪辑,在其附近进行相应的非喇叭剪辑,以确保模型平衡。这个数据集设计用于各种机器学习算法,无论是用于使用二元标签检测喇叭,还是通过考虑文件命名法重新排列标签来对不同的喇叭模式进行分类。在技术验证中,使用从数据集中的音频频谱图训练的卷积神经网络进行分类,在100个训练模型中平均准确率达到89%。

相似文献

1
HornBase: An audio dataset of car horns in different scenarios and positions.HornBase:一个包含不同场景和位置汽车喇叭声的音频数据集。
Data Brief. 2024 Jul 14;55:110678. doi: 10.1016/j.dib.2024.110678. eCollection 2024 Aug.
2
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区,服用抗叶酸抗疟药物的人群中,叶酸补充剂与疟疾易感性和严重程度的关系。
Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.
3
Ensemble machine learning model trained on a new synthesized dataset generalizes well for stress prediction using wearable devices.在新合成数据集上训练的集成机器学习模型,对于使用可穿戴设备进行压力预测具有良好的泛化能力。
J Biomed Inform. 2023 Dec;148:104556. doi: 10.1016/j.jbi.2023.104556. Epub 2023 Dec 2.
4
A multimodal dataset for electric guitar playing technique recognition.一个用于电吉他演奏技巧识别的多模态数据集。
Data Brief. 2023 Nov 22;52:109842. doi: 10.1016/j.dib.2023.109842. eCollection 2024 Feb.
5
lassi onk: a system framework to annotate and classify vehicular honk from road traffic.拉西翁克:一种用于注释和分类道路交通中车辆喇叭声的系统框架。 (注:原文中lassi onk可能有误,推测正确的可能是“Lassionk”之类的词,但按照给定文本准确翻译如上)
Environ Monit Assess. 2024 Sep 27;196(10):983. doi: 10.1007/s10661-024-13101-3.
6
End-to-End Train Horn Detection for Railway Transit Safety.轨道交通安全端到端列车鸣笛检测。
Sensors (Basel). 2022 Jun 12;22(12):4453. doi: 10.3390/s22124453.
7
Classifying Autism From Crowdsourced Semistructured Speech Recordings: Machine Learning Model Comparison Study.从众包半结构化语音记录中分类自闭症:机器学习模型比较研究。
JMIR Pediatr Parent. 2022 Apr 14;5(2):e35406. doi: 10.2196/35406.
8
Factors reducing the detectability of train horns by road users: A laboratory study.降低道路使用者察觉火车汽笛声因素的研究:一项实验室研究。
Appl Ergon. 2023 May;109:103984. doi: 10.1016/j.apergo.2023.103984. Epub 2023 Feb 8.
9
Database description: Russian fricatives recorded in 198 real speech sentences from 59 speakers.数据库描述:从59位说话者的198个真实语音句子中记录的俄语擦音。
Data Brief. 2023 May 11;48:109205. doi: 10.1016/j.dib.2023.109205. eCollection 2023 Jun.
10
Detecting Forged Audio Files Using "Mixed Paste" Command: A Deep Learning Approach Based on Korean Phonemic Features.使用“混合粘贴”命令检测伪造音频文件:一种基于韩语音素特征的深度学习方法
Sensors (Basel). 2024 Mar 14;24(6):1872. doi: 10.3390/s24061872.

本文引用的文献

1
Dataset of audio signals from brushless DC motors for predictive maintenance.用于预测性维护的无刷直流电机音频信号数据集。
Data Brief. 2023 Sep 13;50:109569. doi: 10.1016/j.dib.2023.109569. eCollection 2023 Oct.
2
A multi-firearm, multi-orientation audio dataset of gunshots.一个包含多种枪支、多种射击方向的枪声音频数据集。
Data Brief. 2023 Mar 25;48:109091. doi: 10.1016/j.dib.2023.109091. eCollection 2023 Jun.
3
Environment Sound Classification Using a Two-Stream CNN Based on Decision-Level Fusion.基于决策级融合的双流卷积神经网络环境声音分类
Sensors (Basel). 2019 Apr 11;19(7):1733. doi: 10.3390/s19071733.