Suppr超能文献

声信号的图像表示:建模连贯语音的频谱和时域动态的有效工具。

Image representation of the acoustic signal: An effective tool for modeling spectral and temporal dynamics of connected speech.

机构信息

Center for Laryngeal Surgery and Voice Rehabilitation, Massachusetts General Hospital, One Bowdoin Square, 11th Floor, Boston, Massachusetts 02114, USA.

Department of Otolaryngology Head and Neck Surgery, Division of Laryngology, Stanford University School of Medicine, Stanford University, 801 Welch Road, Stanford, California. 94305, USA.

出版信息

J Acoust Soc Am. 2022 Jul;152(1):580. doi: 10.1121/10.0012734.

Abstract

Recent studies have advocated for the use of connected speech in clinical voice and speech assessment. This suggestion is based on the presence of clinically relevant information within the onset, offset, and variation in connected speech. Existing works on connected speech utilize methods originally designed for analysis of sustained vowels and, hence, cannot properly quantify the transient behavior of connected speech. This study presents a non-parametric approach to analysis based on a two-dimensional, temporal-spectral representation of speech. Variations along horizontal and vertical axes corresponding to the temporal and spectral dynamics of speech were quantified using two statistical models. The first, a spectral model, was defined as the probability of changes between the energy of two consecutive frequency sub-bands at a fixed time segment. The second, a temporal model, was defined as the probability of changes in the energy of a sub-band between consecutive time segments. As the first step of demonstrating the efficacy and utility of the proposed method, a diagnostic framework was adopted in this study. Data obtained revealed that the proposed method has (at minimum) significant discriminatory power over the existing alternative approaches.

摘要

最近的研究提倡在临床语音和言语评估中使用连贯语音。这一建议的依据是连贯语音的起始、结束和变化中存在与临床相关的信息。现有的连贯语音研究利用最初为分析持续元音设计的方法,因此无法正确量化连贯语音的瞬态行为。本研究提出了一种基于二维时频谱表示的非参数分析方法。使用两个统计模型对沿水平和垂直轴对应于语音的时频动态的变化进行量化。第一个模型是一个频谱模型,定义为在固定时间片段内两个连续频率子带之间能量变化的概率。第二个模型是一个时间模型,定义为连续时间片段之间子带能量变化的概率。作为展示所提出方法的有效性和实用性的第一步,本研究采用了一种诊断框架。所获得的数据表明,该方法(至少)比现有的替代方法具有显著的区分能力。

相似文献

4
Automatic intelligibility assessment of speakers after laryngeal cancer by means of acoustic modeling.
J Voice. 2012 May;26(3):390-7. doi: 10.1016/j.jvoice.2011.04.010. Epub 2011 Aug 5.
6
Validation of Cepstral Acoustic Analysis for Normal and Pathological Voice in the Japanese Language.
J Voice. 2022 Nov;36(6):770-776. doi: 10.1016/j.jvoice.2020.08.026. Epub 2020 Sep 18.
8
Predictive value and discriminant capacity of cepstral- and spectral-based measures during continuous speech.
J Voice. 2013 Jul;27(4):393-400. doi: 10.1016/j.jvoice.2013.02.005. Epub 2013 May 16.
9
The Exploration of an Objective Model for Roughness With Several Acoustic Markers.
J Voice. 2018 Mar;32(2):149-161. doi: 10.1016/j.jvoice.2017.04.017. Epub 2017 May 29.
10
Cepstral analysis of hypokinetic and ataxic voices: correlations with perceptual and other acoustic measures.
J Voice. 2014 Nov;28(6):673-80. doi: 10.1016/j.jvoice.2014.01.013. Epub 2014 May 16.

引用本文的文献

1
Consistency of the Signature of Phonotraumatic Vocal Hyperfunction Across Different Ambulatory Voice Measures.
J Speech Lang Hear Res. 2024 Jul 9;67(7):1997-2020. doi: 10.1044/2024_JSLHR-23-00515. Epub 2024 Jun 11.
3
Toward Generalizable Machine Learning Models in Speech, Language, and Hearing Sciences: Estimating Sample Size and Reducing Overfitting.
J Speech Lang Hear Res. 2024 Mar 11;67(3):753-781. doi: 10.1044/2023_JSLHR-23-00273. Epub 2024 Feb 22.

本文引用的文献

1
Cepstral Peak Prominence Values for Clinical Voice Evaluation.
Am J Speech Lang Pathol. 2020 Aug 4;29(3):1596-1607. doi: 10.1044/2020_AJSLP-20-00001. Epub 2020 Jul 13.
3
Changes in lingual-alveolar contact pressure during speech over six months in amyotrophic lateral sclerosis.
J Commun Disord. 2017 Nov;70:49-60. doi: 10.1016/j.jcomdis.2017.10.004. Epub 2017 Nov 7.
4
Lingual-Alveolar Contact Pressure During Speech in Amyotrophic Lateral Sclerosis: Preliminary Findings.
J Speech Lang Hear Res. 2017 Apr 14;60(4):810-825. doi: 10.1044/2016_JSLHR-S-16-0107.
5
Tolerance of the VocaLog™ Vocal Monitor by Healthy Persons and Individuals With Parkinson Disease.
J Voice. 2015 Jul;29(4):518.e13-20. doi: 10.1016/j.jvoice.2014.09.011. Epub 2015 Feb 26.
6
Long-time average spectrum in individuals with Parkinson disease.
NeuroRehabilitation. 2014;35(1):77-88. doi: 10.3233/NRE-141102.
7
Sustained vowels and continuous speech in the auditory-perceptual evaluation of dysphonia severity.
J Soc Bras Fonoaudiol. 2012;24(2):107-12. doi: 10.1590/s2179-64912012000200003.
8
Perception of vocal tremor during sustained phonation compared with sentence context.
J Voice. 2012 Sep;26(5):668.e1-9. doi: 10.1016/j.jvoice.2011.11.001. Epub 2012 Apr 21.
9
The Acoustic Voice Quality Index: toward improved treatment outcomes assessment in voice disorders.
J Commun Disord. 2010 May-Jun;43(3):161-74. doi: 10.1016/j.jcomdis.2009.12.004. Epub 2009 Dec 23.
10
Pathological assessment of patients' speech signals using nonlinear dynamical analysis.
Comput Biol Med. 2010 Jan;40(1):54-63. doi: 10.1016/j.compbiomed.2009.10.011. Epub 2009 Dec 5.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验