Center for Spoken Language Understanding, Division of Biomedical Computer Science, Department of Science & Engineering, School of Medicine, Oregon Health & Science University, Oregon, USA.
Augment Altern Commun. 2011 Mar;27(1):61-6. doi: 10.3109/07434618.2010.545078. Epub 2011 Feb 2.
This case study describes the generation of a synthetic voice resembling that of an individual before she underwent a laryngectomy. Recordings of this person (6-7 min) speaking prior to the operation were used to create the voice. Synthesis was based on statistical speech models and this method allows models pre-trained on many speakers to be adapted to resemble an individual voice. The results of a listening test in which participants were asked to judge the similarity of the synthetic voice to the pre-operation (target) voice are reported. Members of the patient's family were asked to make a similar judgment. These experiments show that, for most listeners, the voice is quite convincing despite the low quality and small quantity of adaptation data.
本案例研究描述了生成一种类似于个体在接受喉切除术之前的合成语音。使用该人(6-7 分钟)在手术前说话的录音来创建语音。合成基于统计语音模型,并且该方法允许基于许多说话者进行预训练的模型适应于模仿个体声音。报告了在听力测试中参与者被要求判断合成语音与术前(目标)语音的相似性的结果。要求患者的家庭成员做出类似的判断。这些实验表明,对于大多数听众来说,尽管适应数据的质量低且数量少,但该语音还是非常有说服力的。