Suppr超能文献

一个无回声、高保真、多方向语音语料库。

An Anechoic, High-Fidelity, Multidirectional Speech Corpus.

作者信息

Miller Margaret K, Delaram Vahid, Trine Allison, Ananthanarayana Rohit M, Buss Emily, Monson Brian B, Stecker G Christopher

机构信息

Center for Hearing Research, Boys Town National Research Hospital, Omaha, NE.

Department of Speech & Hearing Science, University of Illinois Urbana-Champaign.

出版信息

J Speech Lang Hear Res. 2025 Jan 2;68(1):411-418. doi: 10.1044/2024_JSLHR-24-00296. Epub 2024 Dec 2.

Abstract

INTRODUCTION

We currently lack speech testing materials faithful to broader aspects of real-world auditory scenes such as speech directivity and extended high frequency (EHF; > 8 kHz) content that have demonstrable effects on speech perception. Here, we describe the development of a multidirectional, high-fidelity speech corpus using multichannel anechoic recordings that can be used for future studies of speech perception in complex environments by diverse listeners.

DESIGN

Fifteen male and 15 female talkers (21.3-60.5 years) recorded Bamford-Kowal-Bench (BKB) Standard Sentence Test lists, digits 0-10, and a 2.5-min unscripted narrative. Recordings were made in an anechoic chamber with 17 free-field condenser microphones spanning 0°-180° azimuth angle around the talker using a 48 kHz sampling rate.

RESULTS

Recordings resulted in a large corpus containing four BKB lists, 10 digits, and narratives produced by 30 talkers, and an additional 17 BKB lists (21 total) produced by a subset of six talkers.

CONCLUSIONS

The goal of this study was to create an anechoic, high-fidelity, multidirectional speech corpus using standard speech materials. More naturalistic narratives, useful for the creation of babble noise and speech maskers, were also recorded. A large group of 30 talkers permits testers to select speech materials based on talker characteristics relevant to a specific task. The resulting speech corpus allows for more diverse and precise speech recognition testing, including testing effects of speech directivity and EHF content. Recordings are publicly available.

摘要

引言

目前,我们缺乏忠实于现实世界听觉场景更广泛方面的言语测试材料,例如对言语感知有显著影响的言语指向性和扩展高频(EHF;>8kHz)内容。在此,我们描述了一种多向、高保真言语语料库的开发,该语料库使用多通道消声录音,可用于未来不同听众在复杂环境中进行言语感知研究。

设计

15名男性和15名女性说话者(年龄在21.3 - 60.5岁之间)录制了班福德 - 科瓦尔 - 本奇(BKB)标准句子测试列表、数字0 - 10以及一段2.5分钟的无脚本叙述。录音在消声室内进行,使用17个自由场电容式麦克风,以48kHz采样率围绕说话者在0° - 180°方位角范围内进行录制。

结果

录制得到了一个大型语料库,其中包含由30名说话者生成的四个BKB列表、10个数字和叙述内容,以及由六名说话者子集生成的另外17个BKB列表(共21个)。

结论

本研究的目标是使用标准言语材料创建一个消声、高保真、多向的言语语料库。还录制了更自然的叙述内容,可用于创建嘈杂声和言语掩蔽器。30名说话者的大群体使测试人员能够根据与特定任务相关的说话者特征选择言语材料。由此产生的言语语料库允许进行更多样化和精确的言语识别测试,包括测试言语指向性和EHF内容的影响。录音可公开获取。

相似文献

1
An Anechoic, High-Fidelity, Multidirectional Speech Corpus.一个无回声、高保真、多方向语音语料库。
J Speech Lang Hear Res. 2025 Jan 2;68(1):411-418. doi: 10.1044/2024_JSLHR-24-00296. Epub 2024 Dec 2.
7
Systemic treatments for metastatic cutaneous melanoma.转移性皮肤黑色素瘤的全身治疗
Cochrane Database Syst Rev. 2018 Feb 6;2(2):CD011123. doi: 10.1002/14651858.CD011123.pub2.
8
Music interventions for acquired brain injury.后天性脑损伤的音乐干预措施
Cochrane Database Syst Rev. 2017 Jan 20;1(1):CD006787. doi: 10.1002/14651858.CD006787.pub3.

本文引用的文献

9
Extended high-frequency hearing enhances speech perception in noise.扩展高频听力可增强噪声环境下的言语感知。
Proc Natl Acad Sci U S A. 2019 Nov 19;116(47):23753-23759. doi: 10.1073/pnas.1903315116. Epub 2019 Nov 4.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验