Suppr超能文献

自动语音测听:使用开源预训练的 Kaldi-NL 自动语音识别是否可行?

Automated Speech Audiometry: Can It Work Using Open-Source Pre-Trained Kaldi-NL Automatic Speech Recognition?

机构信息

Department of Otorhinolaryngology, Head and Neck Surgery, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands.

W.J. Kolff Institute for Biomedical Engineering and Materials Science, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands.

出版信息

Trends Hear. 2024 Jan-Dec;28:23312165241229057. doi: 10.1177/23312165241229057.

Abstract

A practical speech audiometry tool is the digits-in-noise (DIN) test for hearing screening of populations of varying ages and hearing status. The test is usually conducted by a human supervisor (e.g., clinician), who scores the responses spoken by the listener, or online, where software scores the responses entered by the listener. The test has 24-digit triplets presented in an adaptive staircase procedure, resulting in a speech reception threshold (SRT). We propose an alternative automated DIN test setup that can evaluate spoken responses whilst conducted without a human supervisor, using the open-source automatic speech recognition toolkit, Kaldi-NL. Thirty self-reported normal-hearing Dutch adults (19-64 years) completed one DIN + Kaldi-NL test. Their spoken responses were recorded and used for evaluating the transcript of decoded responses by Kaldi-NL. Study 1 evaluated the Kaldi-NL performance through its word error rate (WER), percentage of summed decoding errors regarding only digits found in the transcript compared to the total number of digits present in the spoken responses. Average WER across participants was 5.0% (range 0-48%, SD = 8.8%), with average decoding errors in three triplets per participant. Study 2 analyzed the effect that triplets with decoding errors from Kaldi-NL had on the DIN test output (SRT), using bootstrapping simulations. Previous research indicated 0.70 dB as the typical within-subject SRT variability for normal-hearing adults. Study 2 showed that up to four triplets with decoding errors produce SRT variations within this range, suggesting that our proposed setup could be feasible for clinical applications.

摘要

一种实用的言语测听工具是数字噪声测试(DIN),用于对不同年龄和听力状况的人群进行听力筛查。该测试通常由人类主管(例如临床医生)进行,由主管对听力者的反应进行评分,或者在线进行,由软件对听力者输入的反应进行评分。测试有 24 位数字的三胞胎,采用自适应阶梯程序呈现,得出言语接受阈(SRT)。我们提出了一种替代的自动化 DIN 测试设置,该设置可以在没有人类主管的情况下评估口语反应,使用开源的自动语音识别工具包 Kaldi-NL。30 名自我报告的荷兰正常听力成年人(19-64 岁)完成了一次 DIN+Kaldi-NL 测试。他们的口语反应被记录下来,并用于评估 Kaldi-NL 解码反应的转录本。研究 1 通过其单词错误率(WER)评估 Kaldi-NL 的性能,即相对于转录本中出现的总数字,解码错误的数字百分比与口语反应中出现的总数字相比。参与者的平均 WER 为 5.0%(范围为 0-48%,SD=8.8%),平均每个参与者有三个三胞胎的解码错误。研究 2 使用 bootstrap 模拟分析了 Kaldi-NL 解码错误的三胞胎对 DIN 测试输出(SRT)的影响。先前的研究表明,0.70dB 是正常听力成年人的典型 SRT 内个体差异。研究 2 表明,多达四个有解码错误的三胞胎会产生在此范围内的 SRT 变化,这表明我们提出的设置可能适用于临床应用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7725/10943752/e39831226994/10.1177_23312165241229057-fig1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验