Kang John S, Moon Kee S, Lee Sung Q, Satterlee Nicholas, Zuo Xiaowei
Department of Mechanical Engineering, San Diego State University, San Diego, CA 92182, USA.
Sensors (Basel). 2025 Apr 21;25(8):2624. doi: 10.3390/s25082624.
This paper introduces a wearable silent text input system designed to capture text input through silent speech, without generating audible sound. The system integrates Electromyography (EMG) and piezoelectric lead zirconate titanate (PZT) sensors in a miniaturized form that can be comfortably attached to the chin, making it both comfortable to wear and esthetically pleasing. The EMG sensor records muscle activity linked to specific tongue and jaw movements, while the PZT sensor measures the minute vibrations and pressure changes in the chin skin caused by silent speech. Data from both sensors are analyzed to capture the timing and intensity of the silent speech signals, allowing the extraction of key features in both time and frequency domain. Several machine learning (ML) models, including both feature-based and non-feature-based approaches commonly used for classification tasks, are employed and compared to detect and classify subtle variations in sensor signals associated with individual alphabet letters. To evaluate and compare the ML models, EMG and PZT signals for the eight most frequently used English letters are collected across one hundred trials each. Results showed that non-feature-based models, particularly the Fea-Shot Learning with fused EMG and PZT signals, achieved the highest accuracy (95.63%) and F1-score (95.62%). The proposed system's accuracy and real-time performance make it promising for silent text input and assistive communication applications.
本文介绍了一种可穿戴式无声文本输入系统,该系统旨在通过无声语音捕获文本输入,而不产生可听声音。该系统以小型化形式集成了肌电图(EMG)和锆钛酸铅(PZT)压电传感器,可以舒适地附着在下巴上,使其佩戴起来既舒适又美观。EMG传感器记录与特定舌头和下颚运动相关的肌肉活动,而PZT传感器测量由无声语音引起的下巴皮肤的微小振动和压力变化。对来自两个传感器的数据进行分析,以捕获无声语音信号的时间和强度,从而在时域和频域中提取关键特征。采用并比较了几种机器学习(ML)模型,包括常用于分类任务的基于特征和非基于特征方法,以检测和分类与单个字母相关的传感器信号中的细微变化。为了评估和比较ML模型,针对八个最常用的英文字母,分别进行了一百次试验来收集EMG和PZT信号。结果表明,非基于特征的模型,特别是融合了EMG和PZT信号的Fea-Shot Learning,达到了最高准确率(95.63%)和F1分数(95.62%)。所提出系统的准确性和实时性能使其在无声文本输入和辅助通信应用方面具有广阔前景。