Center for Laryngeal Surgery and Voice Rehabilitation, Massachusetts General Hospital, One Bowdoin Square, 11th Floor, Boston, Massachusetts 02114, USA.
Department of Otolaryngology Head and Neck Surgery, Division of Laryngology, Stanford University School of Medicine, Stanford University, 801 Welch Road, Stanford, California. 94305, USA.
J Acoust Soc Am. 2022 Jul;152(1):580. doi: 10.1121/10.0012734.
Recent studies have advocated for the use of connected speech in clinical voice and speech assessment. This suggestion is based on the presence of clinically relevant information within the onset, offset, and variation in connected speech. Existing works on connected speech utilize methods originally designed for analysis of sustained vowels and, hence, cannot properly quantify the transient behavior of connected speech. This study presents a non-parametric approach to analysis based on a two-dimensional, temporal-spectral representation of speech. Variations along horizontal and vertical axes corresponding to the temporal and spectral dynamics of speech were quantified using two statistical models. The first, a spectral model, was defined as the probability of changes between the energy of two consecutive frequency sub-bands at a fixed time segment. The second, a temporal model, was defined as the probability of changes in the energy of a sub-band between consecutive time segments. As the first step of demonstrating the efficacy and utility of the proposed method, a diagnostic framework was adopted in this study. Data obtained revealed that the proposed method has (at minimum) significant discriminatory power over the existing alternative approaches.
最近的研究提倡在临床语音和言语评估中使用连贯语音。这一建议的依据是连贯语音的起始、结束和变化中存在与临床相关的信息。现有的连贯语音研究利用最初为分析持续元音设计的方法,因此无法正确量化连贯语音的瞬态行为。本研究提出了一种基于二维时频谱表示的非参数分析方法。使用两个统计模型对沿水平和垂直轴对应于语音的时频动态的变化进行量化。第一个模型是一个频谱模型,定义为在固定时间片段内两个连续频率子带之间能量变化的概率。第二个模型是一个时间模型,定义为连续时间片段之间子带能量变化的概率。作为展示所提出方法的有效性和实用性的第一步,本研究采用了一种诊断框架。所获得的数据表明,该方法(至少)比现有的替代方法具有显著的区分能力。