使用受限玻尔兹曼机的静止图像多模态深度手语识别

Multi-Modal Deep Hand Sign Language Recognition in Still Images Using Restricted Boltzmann Machine.

作者信息

Rastgoo Razieh, Kiani Kourosh, Escalera Sergio

机构信息

Electrical and Computer Engineering Department, Semnan University, Semnan 3513119111, Iran.

Department of Mathematics and Informatics, University of de Barcelona and Computer Vision Center, 08007 Barcelona, Spain.

出版信息

Entropy (Basel). 2018 Oct 23;20(11):809. doi: 10.3390/e20110809.

DOI:10.3390/e20110809

PMID:33266533

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7512373/

Abstract

In this paper, a deep learning approach, Restricted Boltzmann Machine (RBM), is used to perform automatic hand sign language recognition from visual data. We evaluate how RBM, as a deep generative model, is capable of generating the distribution of the input data for an enhanced recognition of unseen data. Two modalities, RGB and Depth, are considered in the model input in three forms: original image, cropped image, and noisy cropped image. Five crops of the input image are used and the hand of these cropped images are detected using Convolutional Neural Network (CNN). After that, three types of the detected hand images are generated for each modality and input to RBMs. The outputs of the RBMs for two modalities are fused in another RBM in order to recognize the output sign label of the input image. The proposed multi-modal model is trained on all and part of the American alphabet and digits of four publicly available datasets. We also evaluate the robustness of the proposal against noise. Experimental results show that the proposed multi-modal model, using crops and the RBM fusing methodology, achieves state-of-the-art results on Massey University Gesture Dataset 2012, American Sign Language (ASL). and Fingerspelling Dataset from the University of Surrey's Center for Vision, Speech and Signal Processing, NYU, and ASL Fingerspelling A datasets.

摘要

在本文中，一种深度学习方法，即受限玻尔兹曼机（RBM），被用于从视觉数据中进行自动手语识别。我们评估了作为深度生成模型的RBM如何能够生成输入数据的分布，以增强对未见数据的识别。模型输入中考虑了RGB和深度这两种模态，有三种形式：原始图像、裁剪后的图像和有噪声的裁剪后的图像。使用输入图像的五幅裁剪图像，并使用卷积神经网络（CNN）检测这些裁剪图像中的手部。之后，针对每种模态生成三种类型的检测到手部图像，并输入到RBM中。将两种模态的RBM输出在另一个RBM中进行融合，以识别输入图像的输出手语标签。所提出的多模态模型在四个公开可用数据集的所有以及部分美国字母表和数字上进行训练。我们还评估了该提议对噪声的鲁棒性。实验结果表明，所提出的使用裁剪图像和RBM融合方法的多模态模型，在梅西大学2012年手势数据集、美国手语（ASL）以及来自萨里大学视觉、语音和信号处理中心、纽约大学的指语数据集和ASL指语A数据集上取得了领先的结果。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

使用受限玻尔兹曼机的静止图像多模态深度手语识别

Multi-Modal Deep Hand Sign Language Recognition in Still Images Using Restricted Boltzmann Machine.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

使用受限玻尔兹曼机的静止图像多模态深度手语识别

Multi-Modal Deep Hand Sign Language Recognition in Still Images Using Restricted Boltzmann Machine.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献