处理群延迟频谱图，以研究语音信号中的共振峰和谐波轮廓。

Processing group delay spectrograms for study of formant and harmonic contours in speech signals.

机构信息

International Institute of Information Technology, Hyderabad 500032, India.

Department of Artificial Intelligence and Data Science, Koneru Lakshmaiah Education Foundation, Hyderabad 500075, India.

出版信息

J Acoust Soc Am. 2024 Oct 1;156(4):2422-2433. doi: 10.1121/10.0032364.

DOI:10.1121/10.0032364

PMID:39392353

Abstract

This paper deals with study of formant and harmonic contours by processing the group delay (GD) spectrograms of speech signals. The GD spectrum is the negative derivative of the phase spectrum with respect to frequency. Recent study shows that the GD spectrogram can be obtained without phase wrapping. Formant frequency contours can be observed in the display of the peaks of the instantaneous wideband equivalent GD spectrogram, derived using the modified single frequency filtering (SFF) analysis of speech signals. Harmonic frequency contours can be observed in the display of the peaks of the instantaneous narrowband equivalent GD spectrogram, derived using the modified SFF analysis of speech signals. For synthetic speech signals, the observed formant contours match the ground truth formant contours from which the signal is derived. For natural speech signals, the observed formant contours match approximately with the given ground truth formant contours mostly in the voiced regions. The results are illustrated for several randomly selected utterances from the TIMIT database. While this study helps to observe the contours of formants in the display, automatic extraction of the formant frequencies needs further processing, requiring logic for eliminating the spurious points, without forcing the number of formants.

摘要

本文通过处理语音信号的群延迟（GD）频谱图来研究共振峰和谐波轮廓。GD 频谱是相位谱对频率的负导数。最近的研究表明，GD 频谱图可以在不进行相位缠绕的情况下获得。通过对语音信号进行改进的单频滤波（SFF）分析，得到的瞬时宽带等效 GD 频谱图的峰值显示中可以观察到共振峰频率轮廓。通过对语音信号进行改进的 SFF 分析，得到的瞬时窄带等效 GD 频谱图的峰值显示中可以观察到谐波频率轮廓。对于合成语音信号，观察到的共振峰轮廓与信号所源自的真实共振峰轮廓相匹配。对于自然语音信号，在大多数浊音区域，观察到的共振峰轮廓与给定的真实共振峰轮廓大致匹配。该研究结果通过从 TIMIT 数据库中随机选择的几个语音示例进行了说明。虽然这项研究有助于在显示中观察共振峰的轮廓，但共振峰频率的自动提取还需要进一步的处理，需要逻辑来消除虚假点，而不强制指定共振峰的数量。