Multimedia Department, Polish-Japanese Academy of Information Technology, 02-008 Warsaw, Poland.
Sensors (Basel). 2022 Apr 21;22(9):3188. doi: 10.3390/s22093188.
Total laryngectomy, i.e., the surgical removal of the larynx, has a profound influence on a patient's quality of life. The procedure results in a loss of natural voice, which in effect constitutes a significant socio-psychological problem for the patient. The main aim of the study was to develop a statistical parametric speech synthesis system for a patient with laryngeal cancer, on the basis of the patient's speech samples recorded shortly before the surgery and to check if it was possible to generate speech quality close to that of the original recordings. The recording made use of a representative corpus of the Polish language, consisting of 2150 sentences. The recorded voice proved to indicate dysphonia, which was confirmed by the auditory-perceptual RBH scale (roughness, breathiness, hoarseness) and by acoustical analysis using AVQI (The Acoustic Voice Quality Index). The speech synthesis model was trained using the Merlin repository. Twenty-five experts participated in the MUSHRA listening tests, rating the synthetic voice at 69.4 in terms of the professional voice-over talent recording, on a 0-100 scale, which is a very good result. The authors compared the quality of the synthetic voice to another model of synthetic speech trained with the same corpus, but where a voice-over talent provided the recorded speech samples. The same experts rated the voice at 63.63, which means the patient's synthetic voice with laryngeal cancer obtained a higher score than that of the talent-voice recordings. As such, the method enabled for the creation of a statistical parametric speech synthesizer for patients awaiting total laryngectomy. As a result, the solution would improve the quality of life as well as better mental wellbeing of the patient.
全喉切除术,即喉的外科切除,对患者的生活质量有深远影响。该手术导致自然嗓音丧失,这实际上是患者面临的重大社会心理问题。本研究的主要目的是基于患者在手术前录制的语音样本,为喉癌患者开发一种统计参数语音合成系统,并检查是否有可能生成接近原始录音的语音质量。该录音利用了一个包含 2150 个句子的波兰语代表性语料库。记录的声音表明存在发音障碍,这通过听觉感知 RBH 量表(粗糙度、呼吸声、嘶哑)和使用 AVQI(语音质量指数)的声学分析得到了证实。语音合成模型使用 Merlin 存储库进行训练。二十五位专家参与了 MUSHRA 听力测试,根据 0-100 分制,将合成语音的评分设定为 69.4,与专业旁白录音相比,这是一个非常好的结果。作者将合成语音的质量与另一个使用相同语料库训练的合成语音模型进行了比较,但该模型的语音样本是由旁白演员录制的。同样的专家将该语音评为 63.63,这意味着患有喉癌的患者的合成语音比旁白演员的语音得分更高。因此,该方法为全喉切除术患者创建了统计参数语音合成器。结果,该解决方案将提高患者的生活质量和心理健康。