McLoughlin Ian, Li Jingjie, Song Yan, Sharifzadeh Hamid R
School of Computing, The University of Kent, Medway, UK.
National Engineering Laboratory of Speech and Language Information Processing, The University of Science and Technology of China, Hefei, Anhui, People's Republic of China.
Healthc Technol Lett. 2017 Jun 9;4(4):129-133. doi: 10.1049/htl.2016.0103. eCollection 2017 Aug.
Statistical speech reconstruction for larynx-related dysphonia has achieved good performance using Gaussian mixture models and, more recently, restricted Boltzmann machine arrays; however, deep neural network (DNN)-based systems have been hampered by the limited amount of training data available from individual voice-loss patients. The authors propose a novel DNN structure that allows a partially supervised training approach on spectral features from smaller data sets, yielding very good results compared with the current state-of-the-art.
用于喉部相关发声障碍的统计语音重建,使用高斯混合模型以及最近的受限玻尔兹曼机阵列已取得了良好的效果;然而,基于深度神经网络(DNN)的系统一直受到个体失音患者可用训练数据量有限的阻碍。作者提出了一种新颖的DNN结构,该结构允许对来自较小数据集的频谱特征采用部分监督训练方法,与当前的最先进技术相比,产生了非常好的结果。