Department of Audiology and Speech Pathology, University of Melbourne, Melbourne, Victoria, Australia.
Department of Neurology, Royal Melbourne Hospital, Melbourne, Victoria, Australia.
Folia Phoniatr Logop. 2024;76(4):372-385. doi: 10.1159/000535152. Epub 2023 Nov 16.
Smart devices are widely available and capable of quickly recording and uploading speech segments for health-related analysis. The switch from laboratory recordings with professional-grade microphone setups to remote, smart device-based recordings offers immense potential for the scalability of voice assessment. Yet, a growing body of literature points to a wide heterogeneity among acoustic metrics for their robustness to variation in recording devices. The addition of consumer-grade plug-and-play microphones has been proposed as a possible solution. The aim of our study was to assess if the addition of consumer-grade plug-and-play microphones increases the acoustic measurement agreement between ultra-portable devices and a reference microphone.
Speech was simultaneously recorded by a reference high-quality microphone commonly used in research and by two configurations with plug-and-play microphones. Twelve speech-acoustic features were calculated using recordings from each microphone to determine the agreement intervals in measurements between microphones. Agreement intervals were then compared to expected deviations in speech in various neurological conditions. Each microphone's response to speech and to silence was characterized through acoustic analysis to explore possible reasons for differences in acoustic measurements between microphones. The statistical differentiation of two groups, neurotypical and people with multiple sclerosis, using metrics from each tested microphone was compared to that of the reference microphone.
The two consumer-grade plug-and-play microphones favored high frequencies (mean center of gravity difference ≥ +175.3 Hz) and recorded more noise (mean difference in signal to noise ≤ -4.2 dB) when compared to the reference microphone. Between consumer-grade microphones, differences in relative noise were closely related to distance between the microphone and the speaker's mouth. Agreement intervals between the reference and consumer-grade microphones remained under disease-expected deviations only for fundamental frequency (f0, agreement interval ≤0.06 Hz), f0 instability (f0 CoV, agreement interval ≤0.05%), and tracking of second formant movement (agreement interval ≤1.4 Hz/ms). Agreement between microphones was poor for other metrics, particularly for fine timing metrics (mean pause length and pause length variability for various tasks). The statistical difference between the two groups of speakers was smaller with the plug-and-play than with the reference microphone.
Measurement of f0 and F2 slope was robust to variation in recording equipment, while other acoustic metrics were not. Thus, the tested plug-and-play microphones should not be used interchangeably with professional-grade microphones for speech analysis. Plug-and-play microphones may assist in equipment standardization within speech studies, including remote or self-recording, possibly with small loss in accuracy and statistical power as observed in the current study.
智能设备广泛可用,能够快速记录和上传与健康相关的语音片段,以供分析。从使用专业级麦克风设置的实验室录音到远程、基于智能设备的录音的转变为语音评估的可扩展性提供了巨大的潜力。然而,越来越多的文献指出,对于录音设备的变化,声学指标的稳健性存在很大的异质性。添加消费级即插即用麦克风已被提议作为一种可能的解决方案。本研究的目的是评估添加消费级即插即用麦克风是否会增加超便携设备和参考麦克风之间的声学测量一致性。
使用研究中常用的参考高质量麦克风和两种带有即插即用麦克风的配置同时录制语音。使用每个麦克风的录音计算了 12 个语音声学特征,以确定麦克风之间测量值的一致性区间。然后将一致性区间与各种神经条件下语音的预期偏差进行比较。通过声学分析来探索麦克风之间声学测量值差异的可能原因,从而对每个麦克风对语音和静音的响应进行特征描述。使用每个测试麦克风的指标对神经正常和多发性硬化症患者这两组进行统计学区分,并与参考麦克风进行比较。
与参考麦克风相比,两个消费级即插即用麦克风更偏向于高频(平均重心差异≥+175.3 Hz)并记录更多噪声(平均信噪比差异≤-4.2 dB)。在消费级麦克风之间,相对噪声的差异与麦克风和说话者嘴之间的距离密切相关。参考麦克风和消费级麦克风之间的一致性区间仅在基频(f0,一致性区间≤0.06 Hz)、f0 不稳定性(f0 CoV,一致性区间≤0.05%)和第二共振峰运动跟踪(一致性区间≤1.4 Hz/ms)方面符合疾病预期偏差。其他指标的麦克风一致性较差,特别是各种任务的精细定时指标(平均停顿长度和停顿长度变化)。与参考麦克风相比,用即插即用麦克风对两组扬声器的统计差异较小。
f0 和 F2 斜率的测量对录音设备的变化具有稳健性,而其他声学指标则不然。因此,测试的即插即用麦克风不应与专业级麦克风互换使用进行语音分析。即插即用麦克风可协助语音研究中的设备标准化,包括远程或自我录音,可能会像本研究中观察到的那样,以较小的准确性和统计能力损失为代价。