Awan Shaheen N, Bensoussan Yael, Watts Stephanie, Boyer Micah, Budinsky Robert, Bahr Ruth H
School of Communication Sciences and Disorders & The Communication Technologies Research Center, University of Central Florida, Orlando, FL, United States.
Department of Otolaryngology-Head Neck Surgery, University of South Florida Morsani College of Medicine, Tampa, FL, United States.
Front Digit Health. 2025 Jul 9;7:1610772. doi: 10.3389/fdgth.2025.1610772. eCollection 2025.
The Bridge2AI-Voice. Consortium is developing affordable and accessible voice data to assist in the identification of vocal biomarkers of disease in adults and children. Initial experiments were designed to establish voice recording procedures to be used in research labs and clinical settings, as well as in quiet environments outside of the clinic. The focus has been on isolated vowel productions, which provide a vocal signal that is representative of the biomechanics of the larynx within a static vocal tract. The current experiment considers the impact of sentence productions on the measurement of several acoustic parameters.
Voice recordings from 24 individuals representing a wide range of typical and disordered voices were analyzed. Two CAPE-V sentences were recorded via a head-and-torso model using (1) a research quality, clinical standard microphone/preamplifier/audio interface and (2) smartphones and tablets using their internal microphones and an attached external headset microphone. Mouth-to-microphone distances and environmental noise levels were controlled. Measures of fundamental frequency (F) and spectral and cepstral measures of voice quality valid for use in sentence contexts were analyzed across recording conditions.
Cepstral peak prominence (CPP) values were sensitive to microphone type, noise, and sentence type conditions. Nevertheless, strong linear relationships were observed across recording methods compared to the clinical standard. Measures of F obtained using autocorrelation correlated strongly across recording methods, whereas F measures obtained from the CPP (CPP F) were highly variable and poorly correlated across recording methods and noise conditions. The L/H ratio (a measure of spectral tilt) was significantly affected by recording condition but not background noise, and measures of L/H ratio were also observed to correlate strongly across recording methods and noise conditions.
Current findings revealed that different recording methods can produce significantly different acoustic measures of voice with sentence-level materials. Since microphone characteristics (e.g., frequency response; use of noise cancellation), mouth-to-microphone distances, and background noise conditions can have significant effects on spectral and cepstral assessment of voice, it is essential that recording methods and conditions are explicitly described when designing voice data collection projects and comparing datasets as it may have an impact on voice analysis. Future investigations should evaluate consistency of results among multiple examples of the same device.
Bridge2AI-Voice联盟正在开发价格合理且易于获取的语音数据,以协助识别成人和儿童疾病的嗓音生物标志物。最初的实验旨在建立适用于研究实验室、临床环境以及诊所外安静环境的语音录制程序。重点一直放在孤立元音的发声上,孤立元音能提供一个代表静态声道内喉部生物力学的嗓音信号。当前的实验考虑了句子发声对几个声学参数测量的影响。
对24名代表各种典型和异常嗓音的个体的语音录音进行了分析。通过头和躯干模型录制了两个CAPE-V句子,(1)使用研究级质量、临床标准的麦克风/前置放大器/音频接口,(2)使用智能手机和平板电脑的内置麦克风及连接的外部头戴式麦克风。控制了嘴到麦克风的距离和环境噪声水平。在不同录制条件下,分析了适用于句子语境的基频(F)测量值以及嗓音质量的频谱和cepstral测量值。
cepstral峰值突出度(CPP)值对麦克风类型、噪声和句子类型条件敏感。然而,与临床标准相比,不同录制方法之间观察到了很强的线性关系。使用自相关获得的F测量值在不同录制方法之间相关性很强,而从CPP获得的F测量值(CPP F)在不同录制方法和噪声条件下变化很大且相关性很差。L/H比率(频谱倾斜度的一种测量)受录制条件显著影响,但不受背景噪声影响,并且在不同录制方法和噪声条件下,L/H比率的测量值也观察到有很强的相关性。
当前研究结果表明,不同的录制方法在句子层面的材料上会产生显著不同的嗓音声学测量结果。由于麦克风特性(例如,频率响应;噪声消除的使用)、嘴到麦克风的距离以及背景噪声条件会对嗓音的频谱和cepstral评估产生重大影响,因此在设计语音数据收集项目和比较数据集时,明确描述录制方法和条件至关重要,因为这可能会对语音分析产生影响。未来的研究应该评估同一设备的多个示例之间结果的一致性。