• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

一种用于自发语音中单词突出度的声学测量方法。

An Acoustic Measure for Word Prominence in Spontaneous Speech.

作者信息

Wang Dagen, Narayanan Shrikanth

机构信息

The authors are with the Viterbi School of Engineering, University of Southern California (USC), Los Angeles, CA 90007 USA.

出版信息

IEEE Trans Audio Speech Lang Process. 2007 Feb 1;15(2):690-701. doi: 10.1109/tasl.2006.881703.

DOI:10.1109/tasl.2006.881703
PMID:20454538
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2864931/
Abstract

An algorithm for automatic speech prominence detection is reported in this paper. We describe a comparative analysis on various acoustic features for word prominence detection and report results using a spoken dialog corpus with manually assigned prominence labels. The focus is on features such as spectral intensity and speech rate that are directly extracted from speech based on a correlation-based approach without requiring explicit linguistic or phonetic knowledge. Additionally, various pitch-based measures are studied with respect to their discriminating ability for prominence detection. A parametric scheme for modeling pitch plateau is proposed and this feature alone is found to outperform the traditional local pitch statistics. Two sets of experiments are used to explore the usefulness of the acoustic score generated using these features. The first set focuses on a more traditional way of word prominence detection based on a manually-tagged corpus. A 76.8% classification accuracy was achieved on a corpus of role-playing spoken dialogs. Due to difficulties in manually tagging speech prominence into discrete levels (categories), the second set of experiments focuses on evaluating the score indirectly. Specifically, through experiments on the Switchboard corpus, it is shown that the proposed acoustic score can discriminate between content word and function words in a statistically significant way. The relation between speech prominence and content/function words is also explored. Since prominent words tend to be predominantly content words, and since content words can be automatically marked from text-derived part of speech (POS) information, it is shown that the proposed acoustic score can be indirectly cross-validated through POS information.

摘要

本文报道了一种用于自动语音重音检测的算法。我们描述了对用于单词重音检测的各种声学特征的比较分析,并使用带有手动分配重音标签的口语对话语料库报告了结果。重点在于基于相关性方法直接从语音中提取的特征,如频谱强度和语速,这些方法不需要明确的语言或语音知识。此外,还研究了各种基于音高的度量在重音检测方面的辨别能力。提出了一种用于建模音高平台的参数化方案,发现仅这一特征就优于传统的局部音高统计。使用两组实验来探索利用这些特征生成的声学分数的有用性。第一组实验聚焦于基于手动标注语料库的更传统的单词重音检测方式。在角色扮演口语对话语料库上实现了76.8%的分类准确率。由于将语音重音手动标注到离散级别(类别)存在困难,第二组实验重点在于间接评估分数。具体而言,通过在交换机语料库上的实验表明,所提出的声学分数能够以具有统计学意义的方式区分实词和虚词。还探索了语音重音与实词/虚词之间的关系。由于突出的单词往往主要是实词,并且由于可以从文本派生的词性(POS)信息中自动标记实词,结果表明所提出的声学分数可以通过POS信息进行间接交叉验证。

相似文献

1
An Acoustic Measure for Word Prominence in Spontaneous Speech.一种用于自发语音中单词突出度的声学测量方法。
IEEE Trans Audio Speech Lang Process. 2007 Feb 1;15(2):690-701. doi: 10.1109/tasl.2006.881703.
2
Prosody leaks into the memories of words.韵律会渗透到单词的记忆中。
Cognition. 2021 May;210:104601. doi: 10.1016/j.cognition.2021.104601. Epub 2021 Jan 25.
3
Making predictable unpredictable with style - Behavioral and electrophysiological evidence for the critical role of prosodic expectations in the perception of prominence in speech.用风格使不可预测变得可预测——关于在言语感知中重音感知的韵律预期的关键作用的行为和电生理证据。
Neuropsychologia. 2018 Jan 31;109:181-199. doi: 10.1016/j.neuropsychologia.2017.12.011. Epub 2017 Dec 14.
4
Prominence Detection Using Auditory Attention Cues and Task-Dependent High Level Information.利用听觉注意线索和任务相关高级信息进行突出检测
IEEE Trans Audio Speech Lang Process. 2009 Jul 1;17(5):1009-1024. doi: 10.1109/tasl.2009.2014795.
5
A fine-grained Chinese word segmentation and part-of-speech tagging corpus for clinical text.一个用于临床文本的细粒度中文分词和词性标注语料库。
BMC Med Inform Decis Mak. 2019 Apr 9;19(Suppl 2):66. doi: 10.1186/s12911-019-0770-7.
6
Robust Speech Rate Estimation for Spontaneous Speech.针对自发语音的稳健语速估计
IEEE Trans Audio Speech Lang Process. 2007 Nov 1;15(8):2190-2201. doi: 10.1109/TASL.2007.905178.
7
A conditional random field based approach for high-accuracy part-of-speech tagging using language-independent features.一种基于条件随机场的方法,用于使用与语言无关的特征进行高精度词性标注。
PeerJ Comput Sci. 2024 Dec 11;10:e2577. doi: 10.7717/peerj-cs.2577. eCollection 2024.
8
Multi-Talker Speech Promotes Greater Knowledge-Based Spoken Mandarin Word Recognition in First and Second Language Listeners.多说话者语音促进第一语言和第二语言听众对基于知识的普通话口语单词的更好识别。
Front Psychol. 2020 Feb 20;11:214. doi: 10.3389/fpsyg.2020.00214. eCollection 2020.
9
Errors on a Speech-in-Babble Sentence Recognition Test Reveal Individual Differences in Acoustic Phonetic Perception and Babble Misallocations.嘈杂语音句子识别测试中的错误揭示了声学语音感知和嘈杂语音误分配方面的个体差异。
Ear Hear. 2021 May/Jun;42(3):673-690. doi: 10.1097/AUD.0000000000001020.
10
Comparing Pre-trained and Feature-Based Models for Prediction of Alzheimer's Disease Based on Speech.基于语音比较预训练模型和基于特征的模型对阿尔茨海默病的预测
Front Aging Neurosci. 2021 Apr 27;13:635945. doi: 10.3389/fnagi.2021.635945. eCollection 2021.

引用本文的文献

1
Disentanglement of prosodic meaning: Toward a framework for the analysis of nonverbal information in speech.韵律意义的解析:迈向语音中非言语信息分析的框架
Proc Natl Acad Sci U S A. 2025 Sep 16;122(37):e2500510122. doi: 10.1073/pnas.2500510122. Epub 2025 Sep 12.
2
Acoustic Identification of Sentence Accent in Speakers with Dysarthria: Cross-Population Validation and Severity Related Patterns.构音障碍患者句子重音的声学识别:跨人群验证及与严重程度相关的模式
Brain Sci. 2021 Oct 13;11(10):1344. doi: 10.3390/brainsci11101344.
3
Behavioral Signal Processing: Deriving Human Behavioral Informatics From Speech and Language: Computational techniques are presented to analyze and model expressed and perceived human behavior-variedly characterized as typical, atypical, distressed, and disordered-from speech and language cues and their applications in health, commerce, education, and beyond.

本文引用的文献

1
Loudness predicts prominence: fundamental frequency lends little.响度预示着突出程度:基频贡献不大。
J Acoust Soc Am. 2005 Aug;118(2):1038-54. doi: 10.1121/1.1923349.
2
Analysis and synthesis of intonation using the Tilt model.使用倾斜模型对语调进行分析与合成。
J Acoust Soc Am. 2000 Mar;107(3):1697-714. doi: 10.1121/1.428453.
3
Fundamental frequency and perceived prominence of accented syllables. II. Nonfinal accents.
J Acoust Soc Am. 1994 Jun;95(6):3662-5. doi: 10.1121/1.409936.
行为信号处理:从语音和语言中提取人类行为信息学:本文介绍了计算技术,用于从语音和语言线索中分析和建模所表达和感知到的人类行为——这些行为具有典型、非典型、苦恼和紊乱等不同特征——及其在健康、商业、教育等领域的应用。
Proc IEEE Inst Electr Electron Eng. 2013 Feb 7;101(5):1203-1233. doi: 10.1109/JPROC.2012.2236291.
4
Robust Speech Rate Estimation for Spontaneous Speech.针对自发语音的稳健语速估计
IEEE Trans Audio Speech Lang Process. 2007 Nov 1;15(8):2190-2201. doi: 10.1109/TASL.2007.905178.
4
Vowel-onset detection.元音起始检测。
J Acoust Soc Am. 1990 Feb;87(2):866-73. doi: 10.1121/1.398896.