Graduate School of Information Sciences, Nara Institue of Science and Technology, Ikoma, Nara, Japan.
PLoS One. 2018 Jun 14;13(6):e0193521. doi: 10.1371/journal.pone.0193521. eCollection 2018.
This study investigates quality prediction methods for synthesized speech using EEG. Training a predictive model using EEG is challenging due to a small number of training trials, a low signal-to-noise ratio, and a high correlation among independent variables. When a predictive model is trained with a machine learning algorithm, the features extracted from multi-channel EEG signals are usually organized as a vector and their structures are ignored even though they are highly structured signals. This study predicts the subjective rating scores of synthesized speeches, including their overall impression, valence, and arousal, by creating tensor structured features instead of vectorized ones to exploit the structure of the features. We extracted various features to construct a tensor feature that maintained their structure. Vectorized and tensorial features were used to predict the rating scales, and the experimental result showed that prediction with tensorial features achieved the better predictive performance. Among the features, the alpha and beta bands are particularly more effective for predictions than other features, which agrees with previous neurophysiological studies.
本研究使用 EEG 研究合成语音的质量预测方法。由于训练试验次数少、信噪比低以及自变量之间存在高度相关性,使用 EEG 训练预测模型具有挑战性。当使用机器学习算法训练预测模型时,通常将从多通道 EEG 信号中提取的特征组织为向量,而忽略其结构,尽管它们是高度结构化的信号。本研究通过创建张量结构特征而不是矢量化特征来预测合成语音的主观评分,包括整体印象、情感和唤醒度,以利用特征的结构。我们提取了各种特征来构建保持其结构的张量特征。矢量化和张量特征用于预测评分尺度,实验结果表明,使用张量特征进行预测可获得更好的预测性能。在这些特征中,与其他特征相比,alpha 和 beta 波段对于预测更为有效,这与先前的神经生理学研究结果一致。