Miller Margaret K, Delaram Vahid, Trine Allison, Ananthanarayana Rohit M, Buss Emily, Monson Brian B, Stecker G Christopher
Center for Hearing Research, Boys Town National Research Hospital, Omaha, NE.
Department of Speech & Hearing Science, University of Illinois Urbana-Champaign.
J Speech Lang Hear Res. 2025 Jan 2;68(1):411-418. doi: 10.1044/2024_JSLHR-24-00296. Epub 2024 Dec 2.
We currently lack speech testing materials faithful to broader aspects of real-world auditory scenes such as speech directivity and extended high frequency (EHF; > 8 kHz) content that have demonstrable effects on speech perception. Here, we describe the development of a multidirectional, high-fidelity speech corpus using multichannel anechoic recordings that can be used for future studies of speech perception in complex environments by diverse listeners.
Fifteen male and 15 female talkers (21.3-60.5 years) recorded Bamford-Kowal-Bench (BKB) Standard Sentence Test lists, digits 0-10, and a 2.5-min unscripted narrative. Recordings were made in an anechoic chamber with 17 free-field condenser microphones spanning 0°-180° azimuth angle around the talker using a 48 kHz sampling rate.
Recordings resulted in a large corpus containing four BKB lists, 10 digits, and narratives produced by 30 talkers, and an additional 17 BKB lists (21 total) produced by a subset of six talkers.
The goal of this study was to create an anechoic, high-fidelity, multidirectional speech corpus using standard speech materials. More naturalistic narratives, useful for the creation of babble noise and speech maskers, were also recorded. A large group of 30 talkers permits testers to select speech materials based on talker characteristics relevant to a specific task. The resulting speech corpus allows for more diverse and precise speech recognition testing, including testing effects of speech directivity and EHF content. Recordings are publicly available.
目前,我们缺乏忠实于现实世界听觉场景更广泛方面的言语测试材料,例如对言语感知有显著影响的言语指向性和扩展高频(EHF;>8kHz)内容。在此,我们描述了一种多向、高保真言语语料库的开发,该语料库使用多通道消声录音,可用于未来不同听众在复杂环境中进行言语感知研究。
15名男性和15名女性说话者(年龄在21.3 - 60.5岁之间)录制了班福德 - 科瓦尔 - 本奇(BKB)标准句子测试列表、数字0 - 10以及一段2.5分钟的无脚本叙述。录音在消声室内进行,使用17个自由场电容式麦克风,以48kHz采样率围绕说话者在0° - 180°方位角范围内进行录制。
录制得到了一个大型语料库,其中包含由30名说话者生成的四个BKB列表、10个数字和叙述内容,以及由六名说话者子集生成的另外17个BKB列表(共21个)。
本研究的目标是使用标准言语材料创建一个消声、高保真、多向的言语语料库。还录制了更自然的叙述内容,可用于创建嘈杂声和言语掩蔽器。30名说话者的大群体使测试人员能够根据与特定任务相关的说话者特征选择言语材料。由此产生的言语语料库允许进行更多样化和精确的言语识别测试,包括测试言语指向性和EHF内容的影响。录音可公开获取。