The Key Laboratory of Biomedical Information Engineering of Ministry of Education, Department of Biomedical Engineering, School of Life Science and Technology, Xi'an Jiaotong University, Xi'an 710049, PR China.
J Voice. 2013 Mar;27(2):259.e7-259.e16. doi: 10.1016/j.jvoice.2012.10.011. Epub 2013 Jan 5.
To realize an accurate and automatic on/off control of electrolarynx (EL), an artificial neural network (ANN) was introduced for switch identification based on visual information of lips and implemented by an experimental system (ANN-EL). The objective was to confirm the feasibility of the ANN method and evaluate the performance of ANN-EL in Mandarin speech.
Totally five volunteers (one laryngectomee and four normal speakers) participated in the whole process of experiments. First, trained ANN was tested to assess switch identification performance of ANN method. Then, voice initiation/termination time, speech fluency, and word intelligibility were measured and compared with button-EL and video-EL to evaluate on/off control performance of ANN-EL.
The test showed that ANN method performed accurate switch identification (>99%). ANN-EL was as fast as normal voice and button-EL in onset control, but a little slower in offset control. ANN-EL could provide a fluent voice source with rare breaks (<1%) for a continuous speech. The results also indicated that on/off control performance of ANN-EL had a significant impact on perception, lowering the word intelligibility compared with button-EL. However, the words produced by ANN-EL were more intelligible than video-EL by approximately 20%.
The ANN method was proved feasible and effective for switch identification based on visual information of lips. The ANN-EL could provide an accurate on/off control for fluent Mandarin speech.
为了实现对电子喉(EL)的精确自动开关控制,引入了一种基于唇动视觉信息的人工神经网络(ANN)进行开关识别,并通过实验系统(ANN-EL)实现。目的是验证 ANN 方法的可行性,并评估 ANN-EL 在普通话语音中的性能。
共有五名志愿者(一名喉切除患者和四名正常发音者)参与了整个实验过程。首先,测试经过训练的 ANN 以评估 ANN 方法的开关识别性能。然后,测量语音起始/终止时间、语音流畅度和单词可懂度,并与按钮-EL 和视频-EL 进行比较,以评估 ANN-EL 的开/关控制性能。
测试表明,ANN 方法能够进行准确的开关识别(>99%)。ANN-EL 在起始控制方面与正常语音和按钮-EL 一样快,但在结束控制方面稍慢。ANN-EL 可以提供流畅的语音源,很少有停顿(<1%),可以连续说话。结果还表明,ANN-EL 的开/关控制性能对感知有显著影响,与按钮-EL 相比,单词可懂度降低。然而,ANN-EL 生成的单词比视频-EL 更易理解,大约提高了 20%。
ANN 方法已被证明可用于基于唇动视觉信息的开关识别,是可行且有效的。ANN-EL 可以为流畅的普通话语音提供精确的开/关控制。