Department of Electrical and Computer Engineering, The University of Texas at Austin.
Department of Speech, Language, and Hearing Sciences, The University of Texas at Austin.
J Speech Lang Hear Res. 2023 Aug 17;66(8S):3206-3221. doi: 10.1044/2023_JSLHR-22-00319. Epub 2023 May 5.
Current electromagnetic tongue tracking devices are not amenable for daily use and thus not suitable for silent speech interface and other applications. We have recently developed MagTrack, a novel wearable electromagnetic articulograph tongue tracking device. This study aimed to validate MagTrack for potential silent speech interface applications.
We conducted two experiments: (a) classification of eight isolated vowels in consonant-vowel-consonant form and (b) continuous silent speech recognition. In these experiments, we used data from healthy adult speakers collected with MagTrack. The performance of vowel classification was measured by accuracies. The continuous silent speech recognition was measured by phoneme error rates. The performance was then compared with results using data collected with commercial electromagnetic articulograph in a prior study.
The isolated vowel classification using MagTrack achieved an average accuracy of 89.74% when leveraging all MagTrack signals (, , coordinates; orientation; and magnetic signals), which outperformed the accuracy using commercial electromagnetic articulograph data (only , coordinates) in our previous study. The continuous speech recognition from two subjects using MagTrack achieved phoneme error rates of 73.92% and 66.73%, respectively. The commercial electromagnetic articulograph achieved 64.53% from the same subject (66.73% using MagTrack data).
MagTrack showed comparable results with the commercial electromagnetic articulograph when using the same localized information. Adding raw magnetic signals would improve the performance of MagTrack. Our preliminary testing demonstrated the potential for silent speech interface as a lightweight wearable device. This work also lays the foundation to support MagTrack's potential for other applications including visual feedback-based speech therapy and second language learning.
当前的电磁舌跟踪设备不适于日常使用,因此不适合用于无声语音接口和其他应用。我们最近开发了 MagTrack,这是一种新型可穿戴电磁发音器舌跟踪设备。本研究旨在验证 MagTrack 用于潜在无声语音接口应用的能力。
我们进行了两项实验:(a)以辅音-元音-辅音形式分类八个孤立元音,(b)连续无声语音识别。在这些实验中,我们使用 MagTrack 采集的健康成年说话者的数据。元音分类的性能通过准确率进行衡量。连续无声语音识别的性能通过音素错误率进行衡量。然后将性能与之前使用商业电磁发音器数据进行的研究结果进行比较。
使用 MagTrack 时,通过利用所有 MagTrack 信号(,,坐标;方向;和磁信号),孤立元音分类的平均准确率为 89.74%,优于我们之前研究中使用商业电磁发音器数据(仅,坐标)的准确率。两名受试者使用 MagTrack 进行连续语音识别的音素错误率分别为 73.92%和 66.73%。同一名受试者使用商业电磁发音器的准确率为 64.53%(使用 MagTrack 数据为 66.73%)。
当使用相同的局部信息时,MagTrack 与商业电磁发音器的结果相当。添加原始磁信号将提高 MagTrack 的性能。我们的初步测试表明,作为一种轻便可穿戴设备,MagTrack 具有用于无声语音接口的潜力。这项工作还为 MagTrack 支持其他应用(包括基于视觉反馈的言语治疗和第二语言学习)奠定了基础。