• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于混合SMD和基于熵的VAD的稳健音频内容分类

Robust Audio Content Classification Using Hybrid-Based SMD and Entropy-Based VAD.

作者信息

Wang Kun-Ching

机构信息

Department of Information Technology & Communication, Shih Chien University, No. 200, University Rd, Neimen Shiang, Kaohsiung 845, Taiwan.

出版信息

Entropy (Basel). 2020 Feb 6;22(2):183. doi: 10.3390/e22020183.

DOI:10.3390/e22020183
PMID:33285958
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7516611/
Abstract

A robust approach for the application of audio content classification (ACC) is proposed in this paper, especially in variable noise-level conditions. We know that speech, music, and background noise (also called silence) are usually mixed in the noisy audio signal. Based on the findings, we propose a hierarchical ACC approach consisting of three parts: voice activity detection (VAD), speech/music discrimination (SMD), and post-processing. First, entropy-based VAD is successfully used to segment input signal into noisy audio and noise even if variable-noise level is happening. The determinations of one-dimensional (1D)-subband energy information (1D-SEI) and 2D-textural image information (2D-TII) are then formed as a hybrid feature set. The hybrid-based SMD is achieved because the hybrid feature set is input into the classification of the support vector machine (SVM). Finally, a rule-based post-processing of segments is utilized to smoothly determine the output of the ACC system. The noisy audio is successfully classified into noise, speech, and music. Experimental results show that the hierarchical ACC system using hybrid feature-based SMD and entropy-based VAD is successfully evaluated against three available datasets and is comparable with existing methods even in a variable noise-level environment. In addition, our test results with the VAD scheme and hybrid features also shows that the proposed architecture increases the performance of audio content discrimination.

摘要

本文提出了一种适用于音频内容分类(ACC)的可靠方法,特别是在可变噪声水平条件下。我们知道,语音、音乐和背景噪声(也称为静音)通常会混合在有噪声的音频信号中。基于这些发现,我们提出了一种由三部分组成的分层ACC方法:语音活动检测(VAD)、语音/音乐辨别(SMD)和后处理。首先,基于熵的VAD被成功用于将输入信号分割为有噪声音频和噪声,即使存在可变噪声水平。然后,一维(1D)子带能量信息(1D-SEI)和二维纹理图像信息(2D-TII)的确定形成一个混合特征集。基于混合特征集输入支持向量机(SVM)分类实现了基于混合的SMD。最后,利用基于规则的片段后处理来平滑地确定ACC系统的输出。有噪声音频被成功分类为噪声、语音和音乐。实验结果表明,使用基于混合特征的SMD和基于熵的VAD的分层ACC系统在三个可用数据集上得到了成功评估,并且即使在可变噪声水平环境中也与现有方法相当。此外,我们使用VAD方案和混合特征的测试结果还表明,所提出的架构提高了音频内容辨别的性能。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4296/7516611/dbea3b6cbdab/entropy-22-00183-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4296/7516611/71652c245397/entropy-22-00183-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4296/7516611/ea6a93889743/entropy-22-00183-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4296/7516611/273c72e7e0bb/entropy-22-00183-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4296/7516611/a96a69560b71/entropy-22-00183-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4296/7516611/b857f19b710f/entropy-22-00183-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4296/7516611/198c5dddceb5/entropy-22-00183-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4296/7516611/5f71732856cf/entropy-22-00183-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4296/7516611/3f149f8b03a1/entropy-22-00183-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4296/7516611/dbea3b6cbdab/entropy-22-00183-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4296/7516611/71652c245397/entropy-22-00183-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4296/7516611/ea6a93889743/entropy-22-00183-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4296/7516611/273c72e7e0bb/entropy-22-00183-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4296/7516611/a96a69560b71/entropy-22-00183-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4296/7516611/b857f19b710f/entropy-22-00183-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4296/7516611/198c5dddceb5/entropy-22-00183-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4296/7516611/5f71732856cf/entropy-22-00183-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4296/7516611/3f149f8b03a1/entropy-22-00183-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4296/7516611/dbea3b6cbdab/entropy-22-00183-g009.jpg

相似文献

1
Robust Audio Content Classification Using Hybrid-Based SMD and Entropy-Based VAD.基于混合SMD和基于熵的VAD的稳健音频内容分类
Entropy (Basel). 2020 Feb 6;22(2):183. doi: 10.3390/e22020183.
2
A novel voice sensor for the detection of speech signals.一种用于检测语音信号的新型声音传感器。
Sensors (Basel). 2013 Dec 2;13(12):16533-50. doi: 10.3390/s131216533.
3
Voice activity detection algorithm using perceptual wavelet entropy neighbor slope.基于感知小波熵邻域斜率的语音活动检测算法
Biomed Mater Eng. 2014;24(6):3295-301. doi: 10.3233/BME-141152.
4
Multilevel hybrid handcrafted feature extraction based depression recognition method using speech.基于语音的多级混合手工特征提取抑郁症识别方法。
J Affect Disord. 2024 Nov 1;364:9-19. doi: 10.1016/j.jad.2024.08.002. Epub 2024 Aug 9.
5
A hierarchical approach for speech-instrumental-song classification.一种用于语音-器乐-歌曲分类的分层方法。
Springerplus. 2013 Oct 17;2(1):526. doi: 10.1186/2193-1801-2-526. eCollection 2013.
6
Multi-Feature Fusion Method Based on EEG Signal and its Application in Stroke Classification.基于 EEG 信号的多特征融合方法及其在中风分类中的应用。
J Med Syst. 2019 Dec 21;44(2):39. doi: 10.1007/s10916-019-1517-9.
7
A hierarchical framework approach for voice activity detection and speech enhancement.一种用于语音活动检测和语音增强的分层框架方法。
ScientificWorldJournal. 2014;2014:723643. doi: 10.1155/2014/723643. Epub 2014 May 12.
8
A hybrid technique for speech segregation and classification using a sophisticated deep neural network.使用复杂的深度神经网络进行语音分割和分类的混合技术。
PLoS One. 2018 Mar 20;13(3):e0194151. doi: 10.1371/journal.pone.0194151. eCollection 2018.
9
A Hybrid Speech Enhancement Algorithm for Voice Assistance Application.一种用于语音助手应用的混合语音增强算法。
Sensors (Basel). 2021 Oct 23;21(21):7025. doi: 10.3390/s21217025.
10
Noise reduction algorithm with the soft thresholding based on the Shannon entropy and bone-conduction speech cross- correlation bands.基于香农熵和骨传导语音互相关带的软阈值降噪算法。
Technol Health Care. 2018;26(S1):281-289. doi: 10.3233/THC-174615.

引用本文的文献

1
Applications of Entropy in Data Analysis and Machine Learning: A Review.熵在数据分析与机器学习中的应用:综述
Entropy (Basel). 2024 Dec 23;26(12):1126. doi: 10.3390/e26121126.
2
Music Classification Method Using Big Data Feature Extraction and Neural Networks.基于大数据特征提取和神经网络的音乐分类方法。
J Environ Public Health. 2022 Jul 30;2022:5749359. doi: 10.1155/2022/5749359. eCollection 2022.

本文引用的文献

1
Deep learning.深度学习。
Nature. 2015 May 28;521(7553):436-44. doi: 10.1038/nature14539.
2
Time-frequency feature representation using multi-resolution texture analysis and acoustic activity detector for real-life speech emotion recognition.使用多分辨率纹理分析和声学活动检测器进行时频特征表示以实现现实生活中的语音情感识别。
Sensors (Basel). 2015 Jan 14;15(1):1458-78. doi: 10.3390/s150101458.
3
The feature extraction based on texture image information for emotion sensing in speech.基于纹理图像信息的语音情感感知特征提取。
Sensors (Basel). 2014 Sep 9;14(9):16692-714. doi: 10.3390/s140916692.
4
Content-based audio classification and retrieval by support vector machines.基于内容的音频分类与支持向量机检索
IEEE Trans Neural Netw. 2003;14(1):209-15. doi: 10.1109/TNN.2002.806626.