• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

流畅度银行时间戳数据集:用于不流畅检测和自动意图语音识别的更新数据集。

FluencyBank Timestamped: An Updated Data Set for Disfluency Detection and Automatic Intended Speech Recognition.

机构信息

University of Michigan.

出版信息

J Speech Lang Hear Res. 2024 Nov 7;67(11):4203-4215. doi: 10.1044/2024_JSLHR-24-00070. Epub 2024 Oct 8.

DOI:10.1044/2024_JSLHR-24-00070
PMID:39378266
Abstract

PURPOSE

This work introduces updated transcripts, disfluency annotations, and word timings for FluencyBank, which we refer to as FluencyBank Timestamped. This data set will enable the thorough analysis of how speech processing models (such as speech recognition and disfluency detection models) perform when evaluated with typical speech versus speech from people who stutter (PWS).

METHOD

We update the FluencyBank data set, which includes audio recordings from adults who stutter, to explore the robustness of speech processing models. Our update (semi-automated with manual review) includes new transcripts with timestamps and disfluency labels corresponding to each token in the transcript. Our disfluency labels capture typical disfluencies (filled pauses, repetitions, revisions, and partial words), and we explore how speech model performance compares for Switchboard (typical speech) and FluencyBank Timestamped. We present benchmarks for three speech tasks: intended speech recognition, text-based disfluency detection, and audio-based disfluency detection. For the first task, we evaluate how well Whisper performs for intended speech recognition (i.e., transcribing speech without disfluencies). For the next tasks, we evaluate how well a Bidirectional Embedding Representations from Transformers (BERT) text-based model and a Whisper audio-based model perform for disfluency detection. We select these models, BERT and Whisper, as they have shown high accuracies on a broad range of tasks in their language and audio domains, respectively.

RESULTS

For the transcription task, we calculate an intended speech word error rate (isWER) between the model's output and the speaker's intended speech (i.e., speech without disfluencies). We find isWER is comparable between Switchboard and FluencyBank Timestamped, but that Whisper transcribes filled pauses and partial words at higher rates in the latter data set. Within FluencyBank Timestamped, isWER increases with stuttering severity. For the disfluency detection tasks, we find the models detect filled pauses, revisions, and partial words relatively well in FluencyBank Timestamped, but performance drops substantially for repetitions because the models are unable to generalize to the different types of repetitions (e.g., multiple repetitions and sound repetitions) from PWS. We hope that FluencyBank Timestamped will allow researchers to explore closing performance gaps between typical speech and speech from PWS.

CONCLUSIONS

Our analysis shows that there are gaps in speech recognition and disfluency detection performance between typical speech and speech from PWS. We hope that FluencyBank Timestamped will contribute to more advancements in training robust speech processing models.

摘要

目的

本研究介绍了 FluencyBank 的更新转录本、不流畅标注和单词时间戳,我们称之为 FluencyBank Timestamped。该数据集将使研究人员能够深入分析语音处理模型(如语音识别和不流畅检测模型)在评估典型语音与口吃者(PWS)语音时的表现。

方法

我们更新了包括口吃成年人的音频记录在内的 FluencyBank 数据集,以探索语音处理模型的稳健性。我们的更新(半自动加人工审查)包括带有时间戳和标记的新转录本,这些标记对应于转录本中的每个标记。我们的不流畅标签捕捉了典型的不流畅现象(填充停顿、重复、修订和部分词语),并探讨了语音模型在 Switchboard(典型语音)和 FluencyBank Timestamped 中的性能差异。我们为三个语音任务提供了基准:意图语音识别、基于文本的不流畅检测和基于音频的不流畅检测。对于第一个任务,我们评估 Whisper 模型在意图语音识别方面的性能(即,转录没有不流畅现象的语音)。对于下一个任务,我们评估基于双向嵌入表示的转换器(BERT)文本模型和基于 Whisper 的音频模型在不流畅检测方面的性能。我们选择这些模型,BERT 和 Whisper,是因为它们在语言和音频领域的广泛任务中都表现出了很高的准确性。

结果

对于转录任务,我们计算模型输出与说话者意图语音(即没有不流畅现象的语音)之间的意图语音词错误率(isWER)。我们发现 Switchboard 和 FluencyBank Timestamped 之间的 isWER 相当,但 Whisper 在后者数据集中对填充停顿和部分词语的转录率更高。在 FluencyBank Timestamped 中,isWER 随着口吃严重程度的增加而增加。对于不流畅检测任务,我们发现模型在 FluencyBank Timestamped 中相对较好地检测到填充停顿、修订和部分词语,但对于重复的检测性能大幅下降,因为模型无法将重复类型(例如多次重复和声音重复)从 PWS 中推广。我们希望 FluencyBank Timestamped 将有助于研究人员探索典型语音和 PWS 语音之间的性能差距。

结论

我们的分析表明,在典型语音和 PWS 语音之间,在语音识别和不流畅检测性能方面存在差距。我们希望 FluencyBank Timestamped 将有助于在训练鲁棒语音处理模型方面取得更多进展。

相似文献

1
FluencyBank Timestamped: An Updated Data Set for Disfluency Detection and Automatic Intended Speech Recognition.流畅度银行时间戳数据集:用于不流畅检测和自动意图语音识别的更新数据集。
J Speech Lang Hear Res. 2024 Nov 7;67(11):4203-4215. doi: 10.1044/2024_JSLHR-24-00070. Epub 2024 Oct 8.
2
Speech disfluencies in children with developmental dyslexia: How do they differ from typical development?发展性阅读障碍儿童的言语不流畅:与典型发展有何不同?
Int J Lang Commun Disord. 2024 May-Jun;59(3):1032-1042. doi: 10.1111/1460-6984.12978. Epub 2023 Nov 7.
3
Prescription of Controlled Substances: Benefits and Risks管制药品的处方:益处与风险
4
The agreement of phonetic transcriptions between paediatric speech and language therapists transcribing a disordered speech sample.儿科言语和语言治疗师转写语音样本的音标转录的一致性。
Int J Lang Commun Disord. 2024 Sep-Oct;59(5):1981-1995. doi: 10.1111/1460-6984.13043. Epub 2024 Jun 8.
5
Short-Term Memory Impairment短期记忆障碍
6
Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19.在基层医疗机构或医院门诊环境中,如果患者出现以下症状和体征,可判断其是否患有 COVID-19。
Cochrane Database Syst Rev. 2022 May 20;5(5):CD013665. doi: 10.1002/14651858.CD013665.pub3.
7
Dysrhythmic Speech Is a Characteristic of Developmental Stuttering in Adults: A Quantitative Analysis Using Duration- and Interval-Based Rhythm Metrics.节律异常言语是成人发育性口吃的一个特征:使用基于时长和间隔的节律指标进行的定量分析。
J Speech Lang Hear Res. 2025 Apr 8;68(4):1618-1633. doi: 10.1044/2024_JSLHR-24-00076. Epub 2025 Mar 13.
8
Systemic pharmacological treatments for chronic plaque psoriasis: a network meta-analysis.系统性药理学治疗慢性斑块状银屑病:网络荟萃分析。
Cochrane Database Syst Rev. 2021 Apr 19;4(4):CD011535. doi: 10.1002/14651858.CD011535.pub4.
9
Non-pharmacological interventions for stuttering in children six years and younger.针对 6 岁及以下儿童口吃的非药物干预措施。
Cochrane Database Syst Rev. 2021 Sep 9;9(9):CD013489. doi: 10.1002/14651858.CD013489.pub2.
10
Interventions for promoting habitual exercise in people living with and beyond cancer.促进癌症患者及康复者进行习惯性锻炼的干预措施。
Cochrane Database Syst Rev. 2018 Sep 19;9(9):CD010192. doi: 10.1002/14651858.CD010192.pub3.

引用本文的文献

1
Spoken Language Analysis in Aging Research: The Validity of AI-Generated Speech to Text Using OpenAI's Whisper.衰老研究中的口语分析:使用OpenAI的Whisper将人工智能生成的语音转换为文本的有效性。
Gerontology. 2025;71(5):417-424. doi: 10.1159/000545244. Epub 2025 Mar 13.
2
Artificial Intelligence in Communication Sciences and Disorders: Introduction to the Forum.通信科学与障碍领域的人工智能:论坛介绍
J Speech Lang Hear Res. 2024 Nov 7;67(11):4157-4161. doi: 10.1044/2024_JSLHR-24-00594. Epub 2024 Oct 17.

本文引用的文献

1
Stuttering as Defined by Adults Who Stutter.口吃定义:口吃者的看法。
J Speech Lang Hear Res. 2019 Dec 12;62(12):4356-4369. doi: 10.1044/2019_JSLHR-19-00137. Print 2019 Dec 18.
2
The Speech Efficiency Score (SES): A time-domain measure of speech fluency.言语效率评分(SES):一种言语流畅性的时域测量方法。
J Fluency Disord. 2018 Dec;58:61-69. doi: 10.1016/j.jfludis.2018.08.001. Epub 2018 Aug 13.
3
Fluency Bank: A new resource for fluency research and practice.流利度库:流利度研究与实践的新资源。
J Fluency Disord. 2018 Jun;56:69-80. doi: 10.1016/j.jfludis.2018.03.002. Epub 2018 Mar 29.
4
The impact of stuttering on the quality of life in adults who stutter.口吃对成年口吃者生活质量的影响。
J Fluency Disord. 2009 Jun;34(2):61-71. doi: 10.1016/j.jfludis.2009.05.002. Epub 2009 May 14.
5
The University College London Archive of Stuttered Speech (UCLASS).伦敦大学学院口吃语音档案库(UCLASS)。
J Speech Lang Hear Res. 2009 Apr;52(2):556-69. doi: 10.1044/1092-4388(07-0129).
6
Identification of children's stuttered and nonstuttered speech by highly experienced judges: binary judgments and comparisons with disfluency-types definitions.经验丰富的评委对儿童口吃和非口吃言语的识别:二元判断以及与言语不流畅类型定义的比较
J Speech Lang Hear Res. 2008 Aug;51(4):867-78. doi: 10.1044/1092-4388(2008/063).
7
Overall Assessment of the Speaker's Experience of Stuttering (OASES): documenting multiple outcomes in stuttering treatment.口吃者经历总体评估量表(OASES):记录口吃治疗的多种结果
J Fluency Disord. 2006;31(2):90-115. doi: 10.1016/j.jfludis.2006.02.002. Epub 2006 Apr 18.
8
Disfluency rates in conversation: effects of age, relationship, topic, role, and gender.对话中的不流畅率:年龄、关系、话题、角色和性别的影响。
Lang Speech. 2001 Jun;44(Pt 2):123-47. doi: 10.1177/00238309010440020101.
9
Normative disfluency data for early childhood stuttering.幼儿口吃的规范不流畅数据。
J Speech Lang Hear Res. 1999 Aug;42(4):895-909. doi: 10.1044/jslhr.4204.895.