Anikin Andrey
Division of Cognitive Science, Department of Philosophy, Lund University, Box 192, SE- 221 00, Lund, Sweden.
Atten Percept Psychophys. 2025 Apr 28. doi: 10.3758/s13414-025-03060-3.
Roughness is a perceptual characteristic of sound that was first applied to musical consonance and dissonance, but it is increasingly recognized as a central aspect of voice quality in human and animal communication. It may be particularly important for asserting social dominance or attracting attention in urgent signals such as screams. To ensure that the results of roughness research are valid and consistent across studies, we need standard methodology for measuring it. I review the literature on roughness estimation, from classic psychoacoustics to more recent approaches, and present two collections of 602 human vocal samples whose roughness was rated by 162 listeners in perceptual experiments. Two algorithms for estimating roughness acoustically from modulation spectra are then presented and optimized to match the human ratings. One uses a bank of gammatone or Butterworth filters to obtain an auditory spectrogram, and a faster algorithm begins with a conventional spectrogram obtained with Short-Time Fourier transform; both explain ~ 50% of variance in average human ratings per stimulus. The range of modulation frequencies most relevant to roughness perception is [50, 200] Hz; this range can be selected with simple cutoff points or with a lognormal weighting function. Modulation and roughness spectrograms are proposed as visual aids for studying the dynamics of roughness in longer recordings. The described algorithms are implemented in the function modulationSpectrum() from the open-source R library soundgen. The audio recordings and their ratings are freely available from https://osf.io/gvcpx/ and can be used for benchmarking other algorithms.
粗糙度是声音的一种感知特性,最初应用于音乐的协和与不协和,但它越来越被视为人类和动物交流中语音质量的核心方面。在诸如尖叫等紧急信号中,它对于确立社会主导地位或吸引注意力可能尤为重要。为确保粗糙度研究的结果在各项研究中有效且一致,我们需要测量它的标准方法。我回顾了从经典心理声学到最新方法的粗糙度估计文献,并展示了两组共602个人类语音样本,其粗糙度在感知实验中由162名听众进行了评级。然后介绍并优化了两种从调制谱声学估计粗糙度的算法,使其与人类评级相匹配。一种算法使用一组伽马通滤波器或巴特沃斯滤波器来获得听觉频谱图,另一种更快的算法从通过短时傅里叶变换获得的传统频谱图开始;两种算法都能解释每个刺激的平均人类评级中约50%的方差。与粗糙度感知最相关的调制频率范围是[50, 200]赫兹;这个范围可以通过简单的截止点或对数正态加权函数来选择。调制和粗糙度频谱图被提议作为研究较长录音中粗糙度动态的视觉辅助工具。所描述的算法在开源R库soundgen的函数modulationSpectrum()中实现。音频记录及其评级可从https://osf.io/gvcpx/免费获取,可用于对其他算法进行基准测试。