Siedenburg Kai, Müllensiefen Daniel
Department of Medical Physics and Acoustics, Carl von Ossietzky University of OldenburgOldenburg, Germany.
Department of Psychology, Goldsmiths University of LondonLondon, UK.
Front Psychol. 2017 Apr 26;8:639. doi: 10.3389/fpsyg.2017.00639. eCollection 2017.
There is evidence from a number of recent studies that most listeners are able to extract information related to song identity, emotion, or genre from music excerpts with durations in the range of tenths of seconds. Because of these very short durations, timbre as a multifaceted auditory attribute appears as a plausible candidate for the type of features that listeners make use of when processing short music excerpts. However, the importance of timbre in listening tasks that involve short excerpts has not yet been demonstrated empirically. Hence, the goal of this study was to develop a method that allows to explore to what degree similarity judgments of short music clips can be modeled with low-level acoustic features related to timbre. We utilized the similarity data from two large samples of participants: Sample I was obtained via an online survey, used 16 clips of 400 ms length, and contained responses of 137,339 participants. Sample II was collected in a lab environment, used 16 clips of 800 ms length, and contained responses from 648 participants. Our model used two sets of audio features which included commonly used timbre descriptors and the well-known Mel-frequency cepstral coefficients as well as their temporal derivates. In order to predict pairwise similarities, the resulting distances between clips in terms of their audio features were used as predictor variables with partial least-squares regression. We found that a sparse selection of three to seven features from both descriptor sets-mainly encoding the coarse shape of the spectrum as well as spectrotemporal variability-best predicted similarities across the two sets of sounds. Notably, the inclusion of non-acoustic predictors of musical genre and record release date allowed much better generalization performance and explained up to 50% of shared variance () between observations and model predictions. Overall, the results of this study empirically demonstrate that both acoustic features related to timbre as well as higher level categorical features such as musical genre play a major role in the perception of short music clips.
最近的一些研究表明,大多数听众能够从时长在十分之几秒范围内的音乐片段中提取与歌曲身份、情感或流派相关的信息。由于这些时长非常短,音色作为一种多维度的听觉属性,似乎是听众在处理短音乐片段时所利用的特征类型的一个合理候选因素。然而,音色在涉及短片段的听力任务中的重要性尚未得到实证证明。因此,本研究的目的是开发一种方法,以探究短音乐片段的相似度判断在多大程度上可以用与音色相关的低层次声学特征来建模。我们利用了来自两个大样本参与者的相似度数据:样本I是通过在线调查获得的,使用了16个时长为400毫秒的片段,包含137339名参与者的回答。样本II是在实验室环境中收集的,使用了16个时长为800毫秒的片段,包含648名参与者的回答。我们的模型使用了两组音频特征,其中包括常用的音色描述符、著名的梅尔频率倒谱系数及其时间导数。为了预测成对相似度,将片段在音频特征方面的所得距离用作偏最小二乘回归的预测变量。我们发现,从两个描述符集中稀疏选择三到七个特征——主要编码频谱的粗略形状以及频谱时间变异性——能最好地预测两组声音之间的相似度。值得注意的是,纳入音乐流派和唱片发行日期的非声学预测变量能带来更好的泛化性能,并解释了观测值与模型预测之间高达50%的共享方差()。总体而言,本研究的结果实证表明,与音色相关的声学特征以及诸如音乐流派等高层次分类特征在短音乐片段的感知中都起着重要作用。