Ceolini Enea, Frenkel Charlotte, Shrestha Sumit Bam, Taverni Gemma, Khacef Lyes, Payvand Melika, Donati Elisa
Institute of Neuroinformatics, University of Zurich, ETH Zurich, Zurich, Switzerland.
ICTEAM Institute, Université Catholique de Louvain, Louvain-la-Neuve, Belgium.
Front Neurosci. 2020 Aug 5;14:637. doi: 10.3389/fnins.2020.00637. eCollection 2020.
Hand gestures are a form of non-verbal communication used by individuals in conjunction with speech to communicate. Nowadays, with the increasing use of technology, hand-gesture recognition is considered to be an important aspect of Human-Machine Interaction (HMI), allowing the machine to capture and interpret the user's intent and to respond accordingly. The ability to discriminate between human gestures can help in several applications, such as assisted living, healthcare, neuro-rehabilitation, and sports. Recently, multi-sensor data fusion mechanisms have been investigated to improve discrimination accuracy. In this paper, we present a sensor fusion framework that integrates complementary systems: the electromyography (EMG) signal from muscles and visual information. This multi-sensor approach, while improving accuracy and robustness, introduces the disadvantage of high computational cost, which grows exponentially with the number of sensors and the number of measurements. Furthermore, this huge amount of data to process can affect the classification latency which can be crucial in real-case scenarios, such as prosthetic control. Neuromorphic technologies can be deployed to overcome these limitations since they allow real-time processing in parallel at low power consumption. In this paper, we present a fully neuromorphic sensor fusion approach for hand-gesture recognition comprised of an event-based vision sensor and three different neuromorphic processors. In particular, we used the event-based camera, called DVS, and two neuromorphic platforms, Loihi and ODIN + MorphIC. The EMG signals were recorded using traditional electrodes and then converted into spikes to be fed into the chips. We collected a dataset of five gestures from sign language where visual and electromyography signals are synchronized. We compared a fully neuromorphic approach to a baseline implemented using traditional machine learning approaches on a portable GPU system. According to the chip's constraints, we designed specific spiking neural networks (SNNs) for sensor fusion that showed classification accuracy comparable to the software baseline. These neuromorphic alternatives have increased inference time, between 20 and 40%, with respect to the GPU system but have a significantly smaller energy-delay product (EDP) which makes them between 30× and 600× more efficient. The proposed work represents a new benchmark that moves neuromorphic computing toward a real-world scenario.
手势是人们在说话时配合使用的一种非语言交流形式。如今,随着技术使用的增加,手势识别被视为人机交互(HMI)的一个重要方面,它使机器能够捕捉并解读用户意图并做出相应反应。区分人类手势的能力有助于多种应用,如辅助生活、医疗保健、神经康复和体育等。最近,人们研究了多传感器数据融合机制以提高识别准确率。在本文中,我们提出了一个传感器融合框架,该框架整合了互补系统:来自肌肉的肌电图(EMG)信号和视觉信息。这种多传感器方法虽然提高了准确性和鲁棒性,但也带来了计算成本高的缺点,计算成本会随着传感器数量和测量次数呈指数增长。此外,如此大量的数据需要处理会影响分类延迟,而在实际场景(如假肢控制)中,分类延迟可能至关重要。可以部署神经形态技术来克服这些限制,因为它们允许在低功耗下进行实时并行处理。在本文中,我们提出了一种用于手势识别的全神经形态传感器融合方法,该方法由一个基于事件的视觉传感器和三个不同的神经形态处理器组成。具体而言,我们使用了名为DVS的基于事件的相机以及两个神经形态平台Loihi和ODIN + MorphIC。EMG信号通过传统电极记录,然后转换为脉冲信号输入芯片。我们从手语中收集了一个包含五个手势的数据集,其中视觉信号和肌电图信号是同步的。我们将全神经形态方法与在便携式GPU系统上使用传统机器学习方法实现的基线进行了比较。根据芯片的限制,我们为传感器融合设计了特定的脉冲神经网络(SNN),其显示出与软件基线相当的分类准确率。这些神经形态替代方案相对于GPU系统的推理时间增加了20%到40%,但具有显著更小的能量延迟积(EDP),这使得它们的效率提高了30倍到600倍。所提出的工作代表了一个新的基准,将神经形态计算推向了实际应用场景。