Suppr超能文献

一个模型不够:孤立手语识别的集成。

One Model is Not Enough: Ensembles for Isolated Sign Language Recognition.

机构信息

Department of Cybernetics and New Technologies for the Information Society, University of West Bohemia, Technická 8, 301 00 Pilsen, Czech Republic.

Gymnasium of Johannes Kepler, Parléřova 2/118, 169 00 Prague, Czech Republic.

出版信息

Sensors (Basel). 2022 Jul 4;22(13):5043. doi: 10.3390/s22135043.

Abstract

In this paper, we dive into sign language recognition, focusing on the recognition of isolated signs. The task is defined as a classification problem, where a sequence of frames (i.e., images) is recognized as one of the given sign language glosses. We analyze two appearance-based approaches, I3D and TimeSformer, and one pose-based approach, SPOTER. The appearance-based approaches are trained on a few different data modalities, whereas the performance of SPOTER is evaluated on different types of preprocessing. All the methods are tested on two publicly available datasets: AUTSL and WLASL300. We experiment with ensemble techniques to achieve new state-of-the-art results of 73.84% accuracy on the WLASL300 dataset by using the CMA-ES optimization method to find the best ensemble weight parameters. Furthermore, we present an ensembling technique based on the Transformer model, which we call Neural Ensembler.

摘要

在本文中,我们深入研究了手语识别,重点是孤立手语的识别。这项任务被定义为一个分类问题,即将一序列的帧(即图像)识别为给定的手语词汇之一。我们分析了两种基于外观的方法,I3D 和 TimeSformer,以及一种基于姿势的方法 SPOTER。基于外观的方法可以在多种不同的数据模态上进行训练,而 SPOTER 的性能则可以在不同类型的预处理上进行评估。所有方法都在两个公开可用的数据集 AUTSL 和 WLASL300 上进行了测试。我们尝试了集成技术,通过使用 CMA-ES 优化方法来寻找最佳的集成权重参数,在 WLASL300 数据集上达到了 73.84%的新的最先进的准确率。此外,我们提出了一种基于 Transformer 模型的集成技术,我们称之为神经集成器(Neural Ensembler)。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/61e1/9269724/74cbde0d8bdc/sensors-22-05043-g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验