Shen Junxiao, Khaldi Khadija, Zhou Enmin, Surale Hemant Bhaskar, Karlson Amy
IEEE Trans Vis Comput Graph. 2024 Nov;30(11):7118-7128. doi: 10.1109/TVCG.2024.3456198. Epub 2024 Oct 10.
Text entry with word-gesture keyboards (WGK) is emerging as a popular method and becoming a key interaction for Extended Reality (XR). However, the diversity of interaction modes, keyboard sizes, and visual feedback in these environments introduces divergent word-gesture trajectory data patterns, thus leading to complexity in decoding trajectories into text. Template-matching decoding methods, such as SHARK2 [32], are commonly used for these WGK systems because they are easy to implement and configure. However, these methods are susceptible to decoding inaccuracies for noisy trajectories. While conventional neural-network-based decoders (neural decoders) trained on word-gesture trajectory data have been proposed to improve accuracy, they have their own limitations: they require extensive data for training and deep-learning expertise for implementation. To address these challenges, we propose a novel solution that combines ease of implementation with high decoding accuracy: a generalizable neural decoder enabled by pre-training on large-scale coarsely discretized word-gesture trajectories. This approach produces a ready-to-use WGK decoder that is generalizable across mid-air and on-surface WGK systems in augmented reality (AR) and virtual reality (VR), which is evident by a robust average Top-4 accuracy of 90.4% on four diverse datasets. It significantly outperforms SHARK2 with a 37.2% enhancement and surpasses the conventional neural decoder by 7.4%. Moreover, the Pre-trained Neural Decoder's size is only 4 MB after quantization, without sacrificing accuracy, and it can operate in real-time, executing in just 97 milliseconds on Quest 3.
使用单词手势键盘(WGK)进行文本输入正在成为一种流行的方法,并成为扩展现实(XR)中的关键交互方式。然而,这些环境中交互模式、键盘大小和视觉反馈的多样性引入了不同的单词手势轨迹数据模式,从而导致将轨迹解码为文本的复杂性。模板匹配解码方法,如SHARK2 [32],通常用于这些WGK系统,因为它们易于实现和配置。然而,这些方法容易受到噪声轨迹解码不准确的影响。虽然已经提出了基于传统神经网络的解码器(神经解码器),通过在单词手势轨迹数据上进行训练来提高准确性,但它们有自己的局限性:它们需要大量数据进行训练,并且需要深度学习专业知识来实现。为了应对这些挑战,我们提出了一种新颖的解决方案,将易于实现与高解码准确性相结合:通过在大规模粗离散化的单词手势轨迹上进行预训练实现的通用神经解码器。这种方法产生了一个随时可用的WGK解码器,它可以在增强现实(AR)和虚拟现实(VR)中的空中和表面WGK系统中通用,这在四个不同数据集上90.4%的稳健平均前4准确率中得到了体现。它以37.2%的提升显著优于SHARK2,并比传统神经解码器高出7.4%。此外,预训练神经解码器在量化后的大小仅为4 MB,而不牺牲准确性,并且它可以实时运行,在Quest 3上仅需97毫秒即可执行。