Suppr超能文献

基于激光雷达的人体活动识别的多模态数据处理系统。

A Multimodal Data Processing System for LiDAR-Based Human Activity Recognition.

出版信息

IEEE Trans Cybern. 2022 Oct;52(10):10027-10040. doi: 10.1109/TCYB.2021.3085489. Epub 2022 Sep 19.

Abstract

Increasingly, the task of detecting and recognizing the actions of a human has been delegated to some form of neural network processing camera or wearable sensor data. Due to the degree to which the camera can be affected by lighting and wearable sensors scantiness, neither one modality can capture the required data to perform the task confidently. That being the case, range sensors, like light detection and ranging (LiDAR), can complement the process to perceive the environment more robustly. Most recently, researchers have been exploring ways to apply convolutional neural networks to 3-D data. These methods typically rely on a single modality and cannot draw on information from complementing sensor streams to improve accuracy. This article proposes a framework to tackle human activity recognition by leveraging the benefits of sensor fusion and multimodal machine learning. Given both RGB and point cloud data, our method describes the activities being performed by subjects using regions with a convolutional neural network (R-CNN) and a 3-D modified Fisher vector network. Evaluated on a custom captured multimodal dataset demonstrates that the model outputs remarkably accurate human activity classification (90%). Furthermore, this framework can be used for sports analytics, understanding social behavior, surveillance, and perhaps most notably by autonomous vehicles (AVs) to data-driven decision-making policies in urban areas and indoor environments.

摘要

越来越多的情况下,检测和识别人类动作的任务已经委托给某种形式的神经网络处理相机或可穿戴传感器数据。由于相机受到光照的影响程度以及可穿戴传感器的稀疏性,这两种模式都无法捕获执行任务所需的数据。在这种情况下,距离传感器,如激光雷达(LiDAR),可以补充这个过程,从而更稳健地感知环境。最近,研究人员一直在探索将卷积神经网络应用于 3D 数据的方法。这些方法通常依赖于单一模式,无法利用来自互补传感器流的信息来提高准确性。本文提出了一种利用传感器融合和多模态机器学习优势的方法来解决人类活动识别问题。给定 RGB 和点云数据,我们的方法使用卷积神经网络(R-CNN)和 3D 改进的 Fisher 向量网络来描述主体执行的活动。在自定义捕获的多模态数据集上进行评估表明,该模型的输出对人类活动的分类非常准确(90%)。此外,该框架可用于运动分析、理解社交行为、监控,也许最值得注意的是,自动驾驶汽车(AV)可以在城市和室内环境中根据数据做出决策。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验