使用多分辨率频谱特征对癌症治疗患者的语音清晰度进行估计。

Speech intelligibility estimation using multi-resolution spectral features for speakers undergoing cancer treatment.

作者信息

Kim Jonathan C, Rao Hrishikesh, Clements Mark A

机构信息

School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, Georgia 30332

出版信息

J Acoust Soc Am. 2014 Oct;136(4):EL315-21. doi: 10.1121/1.4896410.

DOI:10.1121/1.4896410

PMID:25324116

Abstract

Head and neck cancer can significantly hamper speech production which often reduces speech intelligibility. A method of extracting spectral features is presented. The method uses a multi-resolution sinusoidal transform scheme, which enables better representation of spectral and harmonic characteristics. Regression methods were used to predict interval-scaled intelligibility scores of utterances in the NKI-CCRT speech corpus. The inclusion of these features lowered the mean squared estimation error from 0.43 to 0.39 on a scale from 1 to 7, with a p-value less than 0.001. For binary intelligibility classification, their inclusion resulted in an improvement by 5.0 percentage points when tested on a disjoint set.

摘要

头颈癌会严重妨碍言语产生，这通常会降低言语清晰度。本文提出了一种提取频谱特征的方法。该方法使用多分辨率正弦变换方案，能够更好地表示频谱和谐波特征。采用回归方法预测NKI-CCRT语音语料库中话语的区间标度清晰度分数。纳入这些特征后，在1到7的量表上，均方估计误差从0.43降至0.39，p值小于0.001。对于二元清晰度分类，在不相交集上进行测试时，纳入这些特征使准确率提高了5.0个百分点。