Xie Baijun, Kim Jonathan C, Park Chung Hyuk
Department of Biomedical Engineering, The George Washington University, Washington, DC 20052, USA.
Appl Sci (Basel). 2020 Feb;10(3). doi: 10.3390/app10030902. Epub 2020 Jan 30.
This paper presents a method for extracting novel spectral features based on a sinusoidal model. The method is focused on characterizing the spectral shapes of audio signals using spectral peaks in frequency sub-bands. The extracted features are evaluated for predicting the levels of emotional dimensions, namely arousal and valence. Principal component regression, partial least squares regression, and deep convolutional neural network (CNN) models are used as prediction models for the levels of the emotional dimensions. The experimental results indicate that the proposed features include additional spectral information that common baseline features may not include. Since the quality of audio signals, especially timbre, plays a major role in affecting the perception of emotional valence in music, the inclusion of the presented features will contribute to decreasing the prediction error rate.
本文提出了一种基于正弦模型提取新颖频谱特征的方法。该方法专注于利用频率子带中的频谱峰值来表征音频信号的频谱形状。对提取的特征进行评估,以预测情感维度的水平,即唤醒度和效价。主成分回归、偏最小二乘回归和深度卷积神经网络(CNN)模型被用作情感维度水平的预测模型。实验结果表明,所提出的特征包含常见基线特征可能不包含的额外频谱信息。由于音频信号的质量,特别是音色,在影响音乐中情感效价的感知方面起着主要作用,因此包含所提出的特征将有助于降低预测错误率。