• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

相似文献

1
Robust Speech Rate Estimation for Spontaneous Speech.针对自发语音的稳健语速估计
IEEE Trans Audio Speech Lang Process. 2007 Nov 1;15(8):2190-2201. doi: 10.1109/TASL.2007.905178.
2
An Acoustic Measure for Word Prominence in Spontaneous Speech.一种用于自发语音中单词突出度的声学测量方法。
IEEE Trans Audio Speech Lang Process. 2007 Feb 1;15(2):690-701. doi: 10.1109/tasl.2006.881703.
3
Convex weighting criteria for speaking rate estimation.用于语速估计的凸加权标准。
IEEE/ACM Trans Audio Speech Lang Process. 2015 Sep;23(9):1421-1430. doi: 10.1109/TASLP.2015.2434213.
4
Advances in Completely Automated Vowel Analysis for Sociophonetics: Using End-to-End Speech Recognition Systems With DARLA.社会语音学中全自动化元音分析的进展:使用带有DARLA的端到端语音识别系统
Front Artif Intell. 2021 Sep 24;4:662097. doi: 10.3389/frai.2021.662097. eCollection 2021.
5
Combining multi-scale composite windows with hierarchical smoothing strategy for fingerprint orientation field computation.结合多尺度复合窗口与层次化平滑策略的指纹方向场计算。
Biomed Eng Online. 2018 Oct 1;17(1):136. doi: 10.1186/s12938-018-0559-4.
6
Validating Automatic Diadochokinesis Analysis Methods Across Dysarthria Severity and Syllable Task in Amyotrophic Lateral Sclerosis.验证自动言语速率分析方法在肌萎缩侧索硬化症不同构音障碍严重程度和音节任务中的有效性。
J Speech Lang Hear Res. 2022 Mar 8;65(3):940-953. doi: 10.1044/2021_JSLHR-21-00503. Epub 2022 Feb 16.
7
A spectral/temporal method for robust fundamental frequency tracking.一种用于稳健基频跟踪的频谱/时间方法。
J Acoust Soc Am. 2008 Jun;123(6):4559-71. doi: 10.1121/1.2916590.
8
Improved Speech Spatial Covariance Matrix Estimation for Online Multi-Microphone Speech Enhancement.在线多麦克风语音增强的改进语音空间协方差矩阵估计。
Sensors (Basel). 2022 Dec 22;23(1):111. doi: 10.3390/s23010111.
9
A speech rate estimator using hidden markov models - biomed 2010.一种使用隐马尔可夫模型的语音速率估计器 - 生物医学2010年
Biomed Sci Instrum. 2010;46:392-7.
10
Incorporation of a modified temporal cepstrum smoothing in both signal-to-noise ratio and speech presence probability estimation for speech enhancement.在语音增强的信噪比和语音存在概率估计中融入改进的时域倒谱平滑。
J Acoust Soc Am. 2024 Jun 1;155(6):3678-3689. doi: 10.1121/10.0026223.

引用本文的文献

1
Acoustic and linguistic influences on rise-time modulations in natural English speech: evidence from a sensorimotor synchronization paradigm.声学和语言对自然英语语音中上升时间调制的影响:来自感觉运动同步范式的证据。
Front Psychol. 2025 Jul 18;16:1544948. doi: 10.3389/fpsyg.2025.1544948. eCollection 2025.
2
Perception of temporal structure in speech is influenced by body movement and individual beat perception ability.言语中时间结构的感知受到身体运动和个体节拍感知能力的影响。
Atten Percept Psychophys. 2024 Jul;86(5):1746-1762. doi: 10.3758/s13414-024-02893-8. Epub 2024 May 20.
3
Syllable-rate-adjusted-modulation (SRAM) predicts clear and conversational speech intelligibility.音节速率调整调制(SRAM)可预测清晰且自然的言语可懂度。
Front Hum Neurosci. 2024 Feb 12;18:1324027. doi: 10.3389/fnhum.2024.1324027. eCollection 2024.
4
Automatic detection of prosodic boundaries in spontaneous speech.自动检测口语中的韵律边界。
PLoS One. 2021 May 3;16(5):e0250969. doi: 10.1371/journal.pone.0250969. eCollection 2021.
5
ALICE: An open-source tool for automatic measurement of phoneme, syllable, and word counts from child-centered daylong recordings.ALICE:一个用于从以儿童为中心的全天录音中自动测量音素、音节和单词数的开源工具。
Behav Res Methods. 2021 Apr;53(2):818-835. doi: 10.3758/s13428-020-01460-x.
6
Research and Implementation of Children's Speech Signal Processing System.儿童语音信号处理系统的研究与实现
Open Biomed Eng J. 2015 Aug 31;9:188-193. doi: 10.2174/1874120701509010188. eCollection 2015.
7
A Novel Application System of Assessing the Pronunciation Differences Between Chinese Children and Adults.一种评估中国儿童与成人发音差异的新型应用系统。
Open Biomed Eng J. 2016 Aug 4;10:91-100. doi: 10.2174/1874120701610010091. eCollection 2016.
8
Automatic Prosodic Analysis to Identify Mild Dementia.用于识别轻度痴呆的自动韵律分析
Biomed Res Int. 2015;2015:916356. doi: 10.1155/2015/916356. Epub 2015 Oct 19.
9
Automatic Evaluation of Speech Rhythm Instability and Acceleration in Dysarthrias Associated with Basal Ganglia Dysfunction.自动评估与基底神经节功能障碍相关的构音障碍的语音节奏不稳定性和加速。
Front Bioeng Biotechnol. 2015 Jul 24;3:104. doi: 10.3389/fbioe.2015.00104. eCollection 2015.
10
Convex weighting criteria for speaking rate estimation.用于语速估计的凸加权标准。
IEEE/ACM Trans Audio Speech Lang Process. 2015 Sep;23(9):1421-1430. doi: 10.1109/TASLP.2015.2434213.

本文引用的文献

1
An Acoustic Measure for Word Prominence in Spontaneous Speech.一种用于自发语音中单词突出度的声学测量方法。
IEEE Trans Audio Speech Lang Process. 2007 Feb 1;15(2):690-701. doi: 10.1109/tasl.2006.881703.
2
Effects of disfluencies, predictability, and utterance position on word form variation in English conversation.不流畅性、可预测性及话语位置对英语会话中单词形式变化的影响
J Acoust Soc Am. 2003 Feb;113(2):1001-24. doi: 10.1121/1.1534836.
3
Articulatory strengthening at edges of prosodic domains.韵律域边缘的发音强化。
J Acoust Soc Am. 1997 Jun;101(6):3728-40. doi: 10.1121/1.418332.
4
Articulation rate and its variability in spontaneous speech: a reanalysis and some implications.自发言语中的发音速率及其变异性:重新分析及若干启示
Phonetica. 1984;41(4):215-25. doi: 10.1159/000261728.
5
Automatic segmentation of speech into syllabic units.将语音自动分割为音节单位。
J Acoust Soc Am. 1975 Oct;58(4):880-3. doi: 10.1121/1.380738.
6
Automatic speech recognition using psychoacoustic models.
J Acoust Soc Am. 1979 Feb;65(2):487-98. doi: 10.1121/1.382349.

针对自发语音的稳健语速估计

Robust Speech Rate Estimation for Spontaneous Speech.

作者信息

Wang Dagen, Narayanan Shrikanth S

机构信息

Viterbi School of Engineering, University of Southern California (USC), Los Angeles, CA 90007 USA. He is now with the IBM T. J. Watson Research Center, Yorktown Heights, NY 10598 USA.

出版信息

IEEE Trans Audio Speech Lang Process. 2007 Nov 1;15(8):2190-2201. doi: 10.1109/TASL.2007.905178.

DOI:10.1109/TASL.2007.905178
PMID:20428476
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2860302/
Abstract

In this paper, we propose a direct method for speech rate estimation from acoustic features without requiring any automatic speech transcription. We compare various spectral and temporal signal analysis and smoothing strategies to better characterize the underlying syllable structure to derive speech rate. The proposed algorithm extends the methods of spectral subband correlation by including temporal correlation and the use of prominent spectral subbands for improving the signal correlation essential for syllable detection. Furthermore, to address some of the practical robustness issues in previously proposed methods, we introduce some novel components into the algorithm such as the use of pitch confidence for filtering spurious syllable envelope peaks, magnifying window for tackling neighboring syllable smearing, and relative peak measure thresholds for pseudo peak rejection. We also describe an automated approach for learning algorithm parameters from data, and find the optimal settings through Monte Carlo simulations and parameter sensitivity analysis. Final experimental evaluations are conducted based on a portion of the Switchboard corpus for which manual phonetic segmentation information, and published results for direct comparison are available. The results show a correlation coefficient of 0.745 with respect to the ground truth based on manual segmentation. This result is about a 17% improvement compared to the current best single estimator and a 11% improvement over the multiestimator evaluated on the same Switchboard database.

摘要

在本文中,我们提出了一种直接从声学特征估计语速的方法,无需任何自动语音转录。我们比较了各种频谱和时间信号分析及平滑策略,以更好地表征潜在的音节结构,从而得出语速。所提出的算法通过纳入时间相关性以及使用突出的频谱子带来扩展频谱子带相关性方法,以改善音节检测所需的信号相关性。此外,为了解决先前提出的方法中的一些实际鲁棒性问题,我们在算法中引入了一些新颖的组件,例如使用音高置信度来过滤虚假的音节包络峰值、使用放大窗口来处理相邻音节的模糊以及使用相对峰值测量阈值来拒绝伪峰值。我们还描述了一种从数据中学习算法参数的自动化方法,并通过蒙特卡罗模拟和参数敏感性分析找到最佳设置。最终的实验评估基于Switchboard语料库的一部分进行,该部分语料库具有手动语音分割信息,并且有已发表的结果可供直接比较。结果表明,相对于基于手动分割的真实情况,相关系数为0.745。与当前最佳的单一估计器相比,这一结果提高了约17%,与在同一Switchboard数据库上评估的多估计器相比提高了11%。