Cavalcanti Julio Cesar, Englert Marina, Oliveira Miguel, Constantini Ana Carolina
Universidade Estadual de Campinas (UNICAMP), Institute of Language Studies, Campinas - SP, Brazil.
Universidade Federal de São Paulo (UNIFESP), Department of Communication Disorders, São Paulo - SP, Brazil; Centro de Estudos da Voz (CEV), São Paulo - SP, Brazil.
J Voice. 2023 Mar;37(2):162-172. doi: 10.1016/j.jvoice.2020.12.005. Epub 2021 Jan 13.
This study aimed to analyze the effects of microphone and audio compression variables on voice and speech parameters acquisition.
Acoustic measures were recorded and compared using a high-quality reference microphone and three testing microphones. The tested microphones displayed differences in specifications and acoustic properties. Furthermore, the impact of the audio compression was assessed by resampling the original uncompressed audio files into the MPEG-1/2 Audio Layer 3 (mp3) format at three different compression rates (128 kbps, 64 kbps, 32 kbps). Eight speakers were recruited in each recording session and asked to produce four sustained vowels: two [a] segments and two [ɛ] segments. The audio was captured simultaneously by the reference and tested microphones. The recordings were synchronized and analyzed using the Praat software.
From a set of eight acoustic parameters assessed (f, F1, F2, jitter%, shimmer%, HNR, H1-H2, and CPP), three (f, F2, and jitter%) were suggested as resistant regarding the microphone and audio compression variables. In contrast, some parameters seemed to be significantly affected by both factors: HNR, H1-H2, and CPP; while shimmer% was found sensitive only concerning the latter factor. Moreover, higher compression rates appeared to yield more frequent acoustic distortions than lower rates.
Overall, the outcomes suggest that acoustic parameters are influenced by both the microphone selection and the audio compression usage, which may reflect the practical implications of these components on the acoustic analysis reliability.
本研究旨在分析麦克风和音频压缩变量对语音和言语参数采集的影响。
使用高质量参考麦克风和三个测试麦克风记录并比较声学测量结果。测试的麦克风在规格和声学特性上存在差异。此外,通过将原始未压缩音频文件以三种不同压缩率(128 kbps、64 kbps、32 kbps)重新采样为MPEG-1/2音频层3(mp3)格式来评估音频压缩的影响。每次录音环节招募八名受试者,要求他们发出四个持续元音:两个[a]音段和两个[ɛ]音段。参考麦克风和测试麦克风同时采集音频。使用Praat软件对录音进行同步和分析。
在评估的一组八个声学参数(f、F1、F2、抖动百分比、闪烁百分比、谐噪比、H1-H2和CPP)中,有三个参数(f、F2和抖动百分比)在麦克风和音频压缩变量方面表现出抗性。相比之下,一些参数似乎受到这两个因素的显著影响:谐噪比、H1-H2和CPP;而闪烁百分比仅在后一个因素方面表现出敏感性。此外,较高的压缩率似乎比较低的压缩率产生更频繁的声学失真。
总体而言,结果表明声学参数受麦克风选择和音频压缩使用的影响,这可能反映了这些组件对声学分析可靠性的实际影响。