Anikin Andrey, Herbst Christian T
Division of Cognitive Science, Lund University, Lund, Sweden.
ENES Bioacoustics Research Laboratory, Université Jean Monnet Saint-Étienne, Saint-Étienne, France.
Philos Trans R Soc Lond B Biol Sci. 2025 Apr 3;380(1923):20240003. doi: 10.1098/rstb.2024.0003.
We address two research applications in this methodological review: starting from an audio recording, the goal may be to characterize nonlinear phenomena (NLP) at the level of voice production or to test their perceptual effects on listeners. A crucial prerequisite for this work is the ability to detect NLP in acoustic signals, which can then be correlated with biologically relevant information about the caller and with listeners' reaction. NLP are often annotated manually, but this is labour-intensive and not very reliable, although we describe potentially helpful advanced visualization aids such as reassigned spectrograms and phasegrams. Objective acoustic features can also be useful, including general descriptives (harmonics-to-noise ratio, cepstral peak prominence, vocal roughness), statistics derived from nonlinear dynamics (correlation dimension) and NLP-specific measures (depth of modulation and subharmonics). On the perception side, playback studies can greatly benefit from tools for directly manipulating NLP in recordings. Adding frequency jumps, amplitude modulation and subharmonics is relatively straightforward. Creating biphonation, imitating chaos or removing NLP from a recording are more challenging, but feasible with parametric voice synthesis. We describe the most promising algorithms for analysing and manipulating NLP and provide detailed examples with audio files and R code in supplementary material.This article is part of the theme issue 'Nonlinear phenomena in vertebrate vocalizations: mechanisms and communicative functions'.
在本方法学综述中,我们探讨了两个研究应用:从音频记录开始,目标可能是在语音产生层面表征非线性现象(NLP),或测试它们对听众的感知效应。这项工作的一个关键前提是能够在声学信号中检测到NLP,然后将其与关于发声者的生物学相关信息以及听众的反应相关联。NLP通常是手动标注的,但这既费力又不太可靠,尽管我们描述了一些可能有用的先进可视化辅助工具,如重分配谱图和相位图。客观声学特征也可能有用,包括一般描述性特征(谐波与噪声比、谐波峰值突出度、嗓音粗糙度)、从非线性动力学导出的统计量(关联维数)以及特定于NLP的度量(调制深度和次谐波)。在感知方面,回放研究可以从直接在录音中操纵NLP的工具中大大受益。添加频率跳跃、幅度调制和次谐波相对简单。创建双声、模仿混沌或从录音中去除NLP更具挑战性,但通过参数语音合成是可行的。我们描述了用于分析和操纵NLP的最有前景的算法,并在补充材料中提供了带有音频文件和R代码的详细示例。本文是主题为“脊椎动物发声中的非线性现象:机制与交流功能”的一部分。