IEEE Trans Vis Comput Graph. 2024 Nov;30(11):7441-7451. doi: 10.1109/TVCG.2024.3456179. Epub 2024 Oct 10.
Text entry is a critical capability for any modern computing experience, with lightweight augmented reality (AR) glasses being no exception. Designed for all-day wearability, a limitation of lightweight AR glass is the restriction to the inclusion of multiple cameras for extensive field of view in hand tracking. This constraint underscores the need for an additional input device. We propose a system to address this gap: a ring-based mid-air gesture typing technique, RingGesture, utilizing electrodes to mark the start and end of gesture trajectories and inertial measurement units (IMU) sensors for hand tracking. This method offers an intuitive experience similar to raycast-based mid-air gesture typing found in VR headsets, allowing for a seamless translation of hand movements into cursor navigation. To enhance both accuracy and input speed, we propose a novel deep-learning word prediction framework, Score Fusion, comprised of three key components: a) a word-gesture decoding model, b) a spatial spelling correction model, and c) a lightweight contextual language model. In contrast, this framework fuses the scores from the three models to predict the most likely words with higher precision. We conduct comparative and longitudinal studies to demonstrate two key findings: firstly, the overall effectiveness of RingGesture, which achieves an average text entry speed of 27.3 words per minute (WPM) and a peak performance of 47.9 WPM. Secondly, we highlight the superior performance of the Score Fusion framework, which offers a 28.2% improvement in uncorrected Character Error Rate over a conventional word prediction framework, Naive Correction, leading to a 55.2% improvement in text entry speed for RingGesture. Additionally, RingGesture received a System Usability Score of 83 signifying its excellent usability.
文本输入是任何现代计算体验的关键能力,轻量化增强现实 (AR) 眼镜也不例外。为了实现全天佩戴的舒适性,轻量化 AR 眼镜的一个限制是只能包含多个摄像头,以实现广泛的手跟踪视野。这种限制强调了需要额外的输入设备。我们提出了一个系统来解决这个差距:一种基于戒指的空中手势打字技术,RingGesture,利用电极标记手势轨迹的起点和终点,以及惯性测量单元 (IMU) 传感器进行手跟踪。这种方法提供了类似于 VR 头显中基于射线投射的空中手势打字的直观体验,允许将手部运动无缝转换为光标导航。为了提高准确性和输入速度,我们提出了一种新颖的深度学习单词预测框架,Score Fusion,它由三个关键组件组成:a)单词-手势解码模型,b)空间拼写纠正模型,c)轻量级上下文语言模型。相比之下,该框架融合了三个模型的分数,以更高的精度预测最可能的单词。我们进行了对比和纵向研究,以证明两个关键发现:首先,RingGesture 的整体有效性,其平均文本输入速度为 27.3 个单词每分钟 (WPM),峰值性能为 47.9 WPM。其次,我们强调了 Score Fusion 框架的卓越性能,它比传统的单词预测框架 Naive Correction 提高了未校正字符错误率 28.2%,从而使 RingGesture 的文本输入速度提高了 55.2%。此外,RingGesture 的系统可用性得分达到了 83,表明其具有出色的可用性。