频谱音频描述符的区间和比率缩放

Interval and Ratio Scaling of Spectral Audio Descriptors.

作者信息

Kazazis Savvas, Depalle Philippe, McAdams Stephen

机构信息

Schulich School of Music, McGill University, Montreal, QC, Canada.

出版信息

Front Psychol. 2022 Mar 30;13:835401. doi: 10.3389/fpsyg.2022.835401. eCollection 2022.

DOI:10.3389/fpsyg.2022.835401

PMID:35432077

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9007158/

Abstract

Two experiments were conducted for the derivation of psychophysical scales of the following audio descriptors: spectral centroid, spectral spread, spectral skewness, odd-to-even harmonic ratio, spectral deviation, and spectral slope. The stimulus sets of each audio descriptor were synthesized and (wherever possible) independently controlled through appropriate synthesis techniques. Partition scaling methods were used in both experiments, and the scales were constructed by fitting well-behaving functions to the listeners' ratings. In the first experiment, the listeners' task was the estimation of the relative differences between successive levels of a particular audio descriptor. The median values of listeners' ratings increased with increasing feature values, which confirmed listeners' abilities to estimate intervals. However, there was a large variability in the reliability of the derived interval scales depending on the stimulus spacing in each trial. In the second experiment, listeners had control over the stimulus values and were asked to divide the presented range of values into perceptually equal intervals, which provides a ratio scale. For every descriptor, the reliability of the derived ratio scales was excellent. The unit of a particular ratio scale was assigned empirically so as to facilitate qualitative comparisons between the scales of all audio descriptors. The construction of psychophysical scales based on univariate stimuli allowed for the establishment of cause-and-effect relations between audio descriptors and perceptual dimensions, contrary to past research that has relied on multivariate stimuli and has only examined the correlations between the two. Most importantly, this study provides an understanding of the ways in which the sensation magnitudes of several audio descriptors are apprehended.

摘要

进行了两项实验，以推导以下音频描述符的心理物理量表：谱重心、谱展宽、谱偏度、奇次谐波与偶次谐波比率、谱偏差和谱斜率。每个音频描述符的刺激集通过适当的合成技术进行合成（尽可能独立控制）。两项实验均采用分区标度法，通过将行为良好的函数拟合到听众的评分来构建量表。在第一个实验中，听众的任务是估计特定音频描述符连续水平之间的相对差异。听众评分的中位数随着特征值的增加而增加，这证实了听众估计间隔的能力。然而，根据每次试验中的刺激间距，所推导的间隔量表的可靠性存在很大差异。在第二个实验中，听众可以控制刺激值，并被要求将呈现的值范围划分为感知上相等的间隔，这提供了一个比率量表。对于每个描述符，所推导的比率量表的可靠性都非常好。通过经验确定特定比率量表的单位，以便于对所有音频描述符的量表进行定性比较。与过去依赖多变量刺激且仅研究两者之间相关性的研究相反，基于单变量刺激构建心理物理量表有助于建立音频描述符与感知维度之间的因果关系。最重要的是，本研究提供了对几种音频描述符的感觉量如何被感知的理解。