• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于语速估计的凸加权标准。

Convex weighting criteria for speaking rate estimation.

作者信息

Jiao Yishan, Berisha Visar, Tu Ming, Liss Julie

机构信息

Department of Speech and Hearing Science, Arizona State University.

出版信息

IEEE/ACM Trans Audio Speech Lang Process. 2015 Sep;23(9):1421-1430. doi: 10.1109/TASLP.2015.2434213.

DOI:10.1109/TASLP.2015.2434213
PMID:26167516
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4497798/
Abstract

Speaking rate estimation directly from the speech waveform is a long-standing problem in speech signal processing. In this paper, we pose the speaking rate estimation problem as that of estimating a temporal density function whose integral over a given interval yields the speaking rate within that interval. In contrast to many existing methods, we avoid the more difficult task of detecting individual phonemes within the speech signal and we avoid heuristics such as thresholding the temporal envelope to estimate the number of vowels. Rather, the proposed method aims to learn an optimal weighting function that can be directly applied to time-frequency features in a speech signal to yield a temporal density function. We propose two convex cost functions for learning the weighting functions and an adaptation strategy to customize the approach to a particular speaker using minimal training. The algorithms are evaluated on the TIMIT corpus, on a dysarthric speech corpus, and on the ICSI Switchboard spontaneous speech corpus. Results show that the proposed methods outperform three competing methods on both healthy and dysarthric speech. In addition, for spontaneous speech rate estimation, the result show a high correlation between the estimated speaking rate and ground truth values.

摘要

直接从语音波形估计语速是语音信号处理中一个长期存在的问题。在本文中,我们将语速估计问题视为估计一个时间密度函数的问题,该函数在给定区间上的积分得出该区间内的语速。与许多现有方法不同,我们避免了在语音信号中检测单个音素这一更为困难的任务,并且我们避免了诸如对时间包络进行阈值处理以估计元音数量等启发式方法。相反,所提出的方法旨在学习一个最优加权函数,该函数可以直接应用于语音信号中的时频特征以产生一个时间密度函数。我们提出了两个用于学习加权函数的凸代价函数以及一种适应策略,以使用最少的训练将该方法定制到特定的说话者。这些算法在TIMIT语料库、构音障碍语音语料库和ICSI交换机自发语音语料库上进行了评估。结果表明,所提出的方法在正常语音和构音障碍语音上均优于三种竞争方法。此外,对于自发语音速率估计,结果表明估计的语速与真实值之间具有高度相关性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8de9/4497798/88b9ce787372/nihms698651f6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8de9/4497798/94fcdae6032b/nihms698651f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8de9/4497798/b7723e3b110f/nihms698651f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8de9/4497798/c225cac72b60/nihms698651f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8de9/4497798/6ae915de4f2a/nihms698651f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8de9/4497798/d2093d29883d/nihms698651f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8de9/4497798/88b9ce787372/nihms698651f6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8de9/4497798/94fcdae6032b/nihms698651f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8de9/4497798/b7723e3b110f/nihms698651f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8de9/4497798/c225cac72b60/nihms698651f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8de9/4497798/6ae915de4f2a/nihms698651f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8de9/4497798/d2093d29883d/nihms698651f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8de9/4497798/88b9ce787372/nihms698651f6.jpg

相似文献

1
Convex weighting criteria for speaking rate estimation.用于语速估计的凸加权标准。
IEEE/ACM Trans Audio Speech Lang Process. 2015 Sep;23(9):1421-1430. doi: 10.1109/TASLP.2015.2434213.
2
Selecting Disorder-Specific Features for Speech Pathology Fingerprinting.为言语病理学指纹识别选择特定疾病特征
Proc IEEE Int Conf Acoust Speech Signal Process. 2013:7562-7566. doi: 10.1109/ICASSP.2013.6639133.
3
Robust Speech Rate Estimation for Spontaneous Speech.针对自发语音的稳健语速估计
IEEE Trans Audio Speech Lang Process. 2007 Nov 1;15(8):2190-2201. doi: 10.1109/TASL.2007.905178.
4
A speech rate estimator using hidden markov models - biomed 2010.一种使用隐马尔可夫模型的语音速率估计器 - 生物医学2010年
Biomed Sci Instrum. 2010;46:392-7.
5
The influence of speaking rate on vowel space and speech intelligibility for individuals with amyotrophic lateral sclerosis.说话速度对肌萎缩侧索硬化症患者元音空间和言语清晰度的影响。
J Speech Hear Res. 1995 Oct;38(5):1001-13. doi: 10.1044/jshr.3805.1001.
6
Phone duration modeling for speaker age estimation in children.用于儿童说话人年龄估计的通话时长建模。
J Acoust Soc Am. 2022 Nov;152(5):3000. doi: 10.1121/10.0015198.
7
Speaker-Independent Phoneme Alignment Using Transition-Dependent States.使用依赖于过渡的状态进行与说话者无关的音素对齐
Speech Commun. 2009 Apr;51(4):352-368. doi: 10.1016/j.specom.2008.11.003.
8
Regularized Speaker Adaptation of KL-HMM for Dysarthric Speech Recognition.正则化说话人自适应 KL-HMM 在构音障碍语音识别中的应用。
IEEE Trans Neural Syst Rehabil Eng. 2017 Sep;25(9):1581-1591. doi: 10.1109/TNSRE.2017.2681691. Epub 2017 Mar 13.
9
Communication efficiency of dysarthric speakers as measured by sentence intelligibility and speaking rate.通过句子可懂度和说话速率衡量的构音障碍患者的沟通效率。
J Speech Hear Disord. 1981 Aug;46(3):296-301. doi: 10.1044/jshd.4603.296.
10
Joint Dictionary Learning-Based Non-Negative Matrix Factorization for Voice Conversion to Improve Speech Intelligibility After Oral Surgery.基于联合字典学习的非负矩阵分解用于口腔手术后语音转换以提高语音清晰度
IEEE Trans Biomed Eng. 2017 Nov;64(11):2584-2594. doi: 10.1109/TBME.2016.2644258.

引用本文的文献

1
Augmented words to improve a deep learning-based Indonesian syllabification.增强词以改进基于深度学习的印尼语音节划分。
Heliyon. 2021 Oct 5;7(10):e08115. doi: 10.1016/j.heliyon.2021.e08115. eCollection 2021 Oct.
2
Altered speech with migraine attacks: A prospective, longitudinal study of episodic migraine without aura.偏头痛发作时言语改变:无先兆偏头痛发作的前瞻性纵向研究。
Cephalalgia. 2019 May;39(6):722-731. doi: 10.1177/0333102418815505. Epub 2018 Nov 17.
3
Acoustic and perceptual speech characteristics of native Mandarin speakers with Parkinson's disease.

本文引用的文献

1
Rhythm as a coordinating device: entrainment with disordered speech.节奏作为一种协调装置:与紊乱言语的同步化。
J Speech Lang Hear Res. 2014 Jun 1;57(3):815-24. doi: 10.1044/2014_JSLHR-S-13-0149.
2
Discriminating dysarthria type from envelope modulation spectra.从包络调制谱中区分构音障碍类型。
J Speech Lang Hear Res. 2010 Oct;53(5):1246-55. doi: 10.1044/1092-4388(2010/09-0121). Epub 2010 Jul 19.
3
Robust Speech Rate Estimation for Spontaneous Speech.针对自发语音的稳健语速估计
患有帕金森病的以普通话为母语者的声学和感知语音特征。
J Acoust Soc Am. 2017 Mar;141(3):EL293. doi: 10.1121/1.4978342.
4
The relationship between perceptual disturbances in dysarthric speech and automatic speech recognition performance.构音障碍性言语中的感知障碍与自动语音识别性能之间的关系。
J Acoust Soc Am. 2016 Nov;140(5):EL416. doi: 10.1121/1.4967208.
IEEE Trans Audio Speech Lang Process. 2007 Nov 1;15(8):2190-2201. doi: 10.1109/TASL.2007.905178.
4
Quantifying speech rhythm abnormalities in the dysarthrias.量化构音障碍中的言语节奏异常。
J Speech Lang Hear Res. 2009 Oct;52(5):1334-52. doi: 10.1044/1092-4388(2009/08-0208). Epub 2009 Aug 28.
5
Praat script to detect syllable nuclei and measure speech rate automatically.用于自动检测音节核心并测量语速的Praat脚本。
Behav Res Methods. 2009 May;41(2):385-90. doi: 10.3758/BRM.41.2.385.
6
An overview of statistical learning theory.统计学习理论概述。
IEEE Trans Neural Netw. 1999;10(5):988-99. doi: 10.1109/72.788640.
7
Dysarthria associated with traumatic brain injury: speaking rate and emphatic stress.与创伤性脑损伤相关的构音障碍:语速与强调重音
J Commun Disord. 2005 May-Jun;38(3):231-60. doi: 10.1016/j.jcomdis.2004.12.001.
8
The influence of speaking rate on vowel space and speech intelligibility for individuals with amyotrophic lateral sclerosis.说话速度对肌萎缩侧索硬化症患者元音空间和言语清晰度的影响。
J Speech Hear Res. 1995 Oct;38(5):1001-13. doi: 10.1044/jshr.3805.1001.
9
Speaking rate and speech movement velocity profiles.说话速率和言语运动速度剖面图。
J Speech Hear Res. 1993 Feb;36(1):41-54. doi: 10.1044/jshr.3601.41.
10
Effect of speaking rate on the perceptual structure of a phonetic category.语速对语音范畴感知结构的影响。
Percept Psychophys. 1989 Dec;46(6):505-12. doi: 10.3758/bf03208147.