Suppr超能文献

使用受限玻尔兹曼机的静止图像多模态深度手语识别

Multi-Modal Deep Hand Sign Language Recognition in Still Images Using Restricted Boltzmann Machine.

作者信息

Rastgoo Razieh, Kiani Kourosh, Escalera Sergio

机构信息

Electrical and Computer Engineering Department, Semnan University, Semnan 3513119111, Iran.

Department of Mathematics and Informatics, University of de Barcelona and Computer Vision Center, 08007 Barcelona, Spain.

出版信息

Entropy (Basel). 2018 Oct 23;20(11):809. doi: 10.3390/e20110809.

Abstract

In this paper, a deep learning approach, Restricted Boltzmann Machine (RBM), is used to perform automatic hand sign language recognition from visual data. We evaluate how RBM, as a deep generative model, is capable of generating the distribution of the input data for an enhanced recognition of unseen data. Two modalities, RGB and Depth, are considered in the model input in three forms: original image, cropped image, and noisy cropped image. Five crops of the input image are used and the hand of these cropped images are detected using Convolutional Neural Network (CNN). After that, three types of the detected hand images are generated for each modality and input to RBMs. The outputs of the RBMs for two modalities are fused in another RBM in order to recognize the output sign label of the input image. The proposed multi-modal model is trained on all and part of the American alphabet and digits of four publicly available datasets. We also evaluate the robustness of the proposal against noise. Experimental results show that the proposed multi-modal model, using crops and the RBM fusing methodology, achieves state-of-the-art results on Massey University Gesture Dataset 2012, American Sign Language (ASL). and Fingerspelling Dataset from the University of Surrey's Center for Vision, Speech and Signal Processing, NYU, and ASL Fingerspelling A datasets.

摘要

在本文中,一种深度学习方法,即受限玻尔兹曼机(RBM),被用于从视觉数据中进行自动手语识别。我们评估了作为深度生成模型的RBM如何能够生成输入数据的分布,以增强对未见数据的识别。模型输入中考虑了RGB和深度这两种模态,有三种形式:原始图像、裁剪后的图像和有噪声的裁剪后的图像。使用输入图像的五幅裁剪图像,并使用卷积神经网络(CNN)检测这些裁剪图像中的手部。之后,针对每种模态生成三种类型的检测到手部图像,并输入到RBM中。将两种模态的RBM输出在另一个RBM中进行融合,以识别输入图像的输出手语标签。所提出的多模态模型在四个公开可用数据集的所有以及部分美国字母表和数字上进行训练。我们还评估了该提议对噪声的鲁棒性。实验结果表明,所提出的使用裁剪图像和RBM融合方法的多模态模型,在梅西大学2012年手势数据集、美国手语(ASL)以及来自萨里大学视觉、语音和信号处理中心、纽约大学的指语数据集和ASL指语A数据集上取得了领先的结果。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/59d4/7512373/5c9b523f7881/entropy-20-00809-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验