• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

无监督语音分割:假设音素边界分析。

Unsupervised speech segmentation: an analysis of the hypothesized phone boundaries.

机构信息

Centre for Language and Speech Technology, Radboud University Nijmegen, Erasmusplein 1, 6525 HT Nijmegen, The Netherlands.

出版信息

J Acoust Soc Am. 2010 Feb;127(2):1084-95. doi: 10.1121/1.3277194.

DOI:10.1121/1.3277194
PMID:20136229
Abstract

Despite using different algorithms, most unsupervised automatic phone segmentation methods achieve similar performance in terms of percentage correct boundary detection. Nevertheless, unsupervised segmentation algorithms are not able to perfectly reproduce manually obtained reference transcriptions. This paper investigates fundamental problems for unsupervised segmentation algorithms by comparing a phone segmentation obtained using only the acoustic information present in the signal with a reference segmentation created by human transcribers. The analyses of the output of an unsupervised speech segmentation method that uses acoustic change to hypothesize boundaries showed that acoustic change is a fairly good indicator of segment boundaries: over two-thirds of the hypothesized boundaries coincide with segment boundaries. Statistical analyses showed that the errors are related to segment duration, sequences of similar segments, and inherently dynamic phones. In order to improve unsupervised automatic speech segmentation, current one-stage bottom-up segmentation methods should be expanded into two-stage segmentation methods that are able to use a mix of bottom-up information extracted from the speech signal and automatically derived top-down information. In this way, unsupervised methods can be improved while remaining flexible and language-independent.

摘要

尽管使用了不同的算法,但大多数无监督自动电话分割方法在边界检测的正确百分比方面都能达到相似的性能。然而,无监督的分割算法并不能完美地重现手动获得的参考转录。本文通过比较仅使用信号中存在的声学信息获得的电话分割与由人工转录员创建的参考分割,研究了无监督分割算法的基本问题。对使用声学变化假设边界的无监督语音分割方法的输出进行的分析表明,声学变化是边界的一个相当好的指示符:超过三分之二的假设边界与分段边界重合。统计分析表明,错误与分段持续时间、相似分段的序列以及固有动态电话有关。为了改进无监督自动语音分割,目前的单阶段自下而上的分割方法应该扩展为能够使用从语音信号中提取的自下而上的信息和自动推导的自上而下的信息的两阶段分割方法。通过这种方式,可以在保持灵活性和语言独立性的同时改进无监督方法。

相似文献

1
Unsupervised speech segmentation: an analysis of the hypothesized phone boundaries.无监督语音分割:假设音素边界分析。
J Acoust Soc Am. 2010 Feb;127(2):1084-95. doi: 10.1121/1.3277194.
2
Segmenting words from natural speech: subsegmental variation in segmental cues.从自然语音中切分单词:音段线索的次分段变化。
J Child Lang. 2010 Jun;37(3):513-43. doi: 10.1017/S0305000910000085. Epub 2010 Mar 22.
3
The benefit obtained from visually displayed text from an automatic speech recognizer during listening to speech presented in noise.在收听有噪声干扰的语音时,从自动语音识别器的可视文本显示中获得的益处。
Ear Hear. 2008 Dec;29(6):838-52. doi: 10.1097/AUD.0b013e31818005bd.
4
Temporal structure of spoken-word recognition in Croatian in light of the cohort theory.基于群组理论的克罗地亚语口语单词识别的时间结构
Brain Lang. 1999;68(1-2):95-103. doi: 10.1006/brln.1999.2076.
5
A comparison of automatic and human speech recognition in null grammar.自动语音识别与零语法下的人工语音识别比较。
J Acoust Soc Am. 2012 Mar;131(3):EL256-61. doi: 10.1121/1.3684744.
6
Initialization method for speech separation algorithms that work in the time-frequency domain.用于在时频域工作的语音分离算法的初始化方法。
J Acoust Soc Am. 2010 Apr;127(4):EL121-6. doi: 10.1121/1.3310248.
7
Words in puddles of sound: modelling psycholinguistic effects in speech segmentation.声音水坑中的单词:在言语分割中建模心理语言学效应。
J Child Lang. 2010 Jun;37(3):545-64. doi: 10.1017/S0305000909990511. Epub 2010 Mar 22.
8
Rhythm measures and dimensions of durational variation in speech.节奏衡量和语音时长变化的维度。
J Acoust Soc Am. 2011 May;129(5):3258-70. doi: 10.1121/1.3559709.
9
A spectral/temporal method for robust fundamental frequency tracking.一种用于稳健基频跟踪的频谱/时间方法。
J Acoust Soc Am. 2008 Jun;123(6):4559-71. doi: 10.1121/1.2916590.
10
The effects of stress and statistical cues on continuous speech segmentation: an event-related brain potential study.压力和统计线索对连续语音分割的影响:一项事件相关脑电位研究。
Brain Res. 2006 Dec 6;1123(1):168-78. doi: 10.1016/j.brainres.2006.09.046. Epub 2006 Oct 24.