Adi Yossi, Keshet Joseph, Cibelli Emily, Goldrick Matthew
Department of Computer Science, Bar-Ilan University, Ramat-Gan, Israel.
Department of Linguistics, Northwestern University, Evanston, IL, USA.
Proc IEEE Int Conf Acoust Speech Signal Process. 2017 Mar;2017:2422-2426. doi: 10.1109/ICASSP.2017.7952591. Epub 2017 Jun 19.
We describe and analyze a simple and effective algorithm for sequence segmentation applied to speech processing tasks. We propose a neural architecture that is composed of two modules trained jointly: a recurrent neural network (RNN) module and a structured prediction model. The RNN outputs are considered as feature functions to the structured model. The overall model is trained with a structured loss function which can be designed to the given segmentation task. We demonstrate the effectiveness of our method by applying it to two simple tasks commonly used in phonetic studies: word segmentation and voice onset time segmentation. Results suggest the proposed model is superior to previous methods, obtaining state-of-the-art results on the tested datasets.
我们描述并分析了一种应用于语音处理任务的简单有效的序列分割算法。我们提出了一种由两个联合训练的模块组成的神经架构:一个循环神经网络(RNN)模块和一个结构化预测模型。RNN的输出被视为结构化模型的特征函数。整个模型使用一个可以针对给定分割任务进行设计的结构化损失函数进行训练。我们通过将其应用于语音研究中常用的两个简单任务来证明我们方法的有效性:词分割和语音起始时间分割。结果表明,所提出的模型优于以前的方法,在测试数据集上获得了当前最优的结果。