基于混合SMD和基于熵的VAD的稳健音频内容分类

Robust Audio Content Classification Using Hybrid-Based SMD and Entropy-Based VAD.

作者信息

Wang Kun-Ching

机构信息

Department of Information Technology & Communication, Shih Chien University, No. 200, University Rd, Neimen Shiang, Kaohsiung 845, Taiwan.

出版信息

Entropy (Basel). 2020 Feb 6;22(2):183. doi: 10.3390/e22020183.

DOI:10.3390/e22020183

PMID:33285958

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7516611/

Abstract

A robust approach for the application of audio content classification (ACC) is proposed in this paper, especially in variable noise-level conditions. We know that speech, music, and background noise (also called silence) are usually mixed in the noisy audio signal. Based on the findings, we propose a hierarchical ACC approach consisting of three parts: voice activity detection (VAD), speech/music discrimination (SMD), and post-processing. First, entropy-based VAD is successfully used to segment input signal into noisy audio and noise even if variable-noise level is happening. The determinations of one-dimensional (1D)-subband energy information (1D-SEI) and 2D-textural image information (2D-TII) are then formed as a hybrid feature set. The hybrid-based SMD is achieved because the hybrid feature set is input into the classification of the support vector machine (SVM). Finally, a rule-based post-processing of segments is utilized to smoothly determine the output of the ACC system. The noisy audio is successfully classified into noise, speech, and music. Experimental results show that the hierarchical ACC system using hybrid feature-based SMD and entropy-based VAD is successfully evaluated against three available datasets and is comparable with existing methods even in a variable noise-level environment. In addition, our test results with the VAD scheme and hybrid features also shows that the proposed architecture increases the performance of audio content discrimination.

摘要

本文提出了一种适用于音频内容分类（ACC）的可靠方法，特别是在可变噪声水平条件下。我们知道，语音、音乐和背景噪声（也称为静音）通常会混合在有噪声的音频信号中。基于这些发现，我们提出了一种由三部分组成的分层ACC方法：语音活动检测（VAD）、语音/音乐辨别（SMD）和后处理。首先，基于熵的VAD被成功用于将输入信号分割为有噪声音频和噪声，即使存在可变噪声水平。然后，一维（1D）子带能量信息（1D-SEI）和二维纹理图像信息（2D-TII）的确定形成一个混合特征集。基于混合特征集输入支持向量机（SVM）分类实现了基于混合的SMD。最后，利用基于规则的片段后处理来平滑地确定ACC系统的输出。有噪声音频被成功分类为噪声、语音和音乐。实验结果表明，使用基于混合特征的SMD和基于熵的VAD的分层ACC系统在三个可用数据集上得到了成功评估，并且即使在可变噪声水平环境中也与现有方法相当。此外，我们使用VAD方案和混合特征的测试结果还表明，所提出的架构提高了音频内容辨别的性能。