• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

多模态艺术姿态识别与人类智能增强交互

Multimodal Art Pose Recognition and Interaction With Human Intelligence Enhancement.

作者信息

Ma Chengming, Liu Qian, Dang Yaqi

机构信息

College of Communication, Northwest Normal University, Lanzhou, China.

出版信息

Front Psychol. 2021 Nov 8;12:769509. doi: 10.3389/fpsyg.2021.769509. eCollection 2021.

DOI:10.3389/fpsyg.2021.769509
PMID:34819900
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8606411/
Abstract

This paper provides an in-depth study and analysis of human artistic poses through intelligently enhanced multimodal artistic pose recognition. A complementary network model architecture of multimodal information based on motion energy proposed. The network exploits both the rich information of appearance features provided by RGB data and the depth information provided by depth data as well as the characteristics of robustness to luminance and observation angle. The multimodal fusion is accomplished by the complementary information characteristics of the two modalities. Moreover, to better model the long-range temporal structure while considering action classes with sub-action sharing phenomena, an energy-guided video segmentation method is employed. And in the feature fusion stage, a cross-modal cross-fusion approach is proposed, which enables the convolutional network to share local features of two modalities not only in the shallow layer but also to obtain the fusion of global features in the deep convolutional layer by connecting the feature maps of multiple convolutional layers. Firstly, the Kinect camera is used to acquire the color image data of the human body, the depth image data, and the 3D coordinate data of the skeletal points using the Open pose open-source framework. Then, the action automatically extracted from keyframes based on the distance between the hand and the head, and the relative distance features are extracted from the keyframes to describe the action, the local occupancy pattern features and HSV color space features are extracted to describe the object, and finally, the feature fusion is performed and the complex action recognition task is completed. To solve the consistency problem of virtual-reality fusion, the mapping relationship between hand joint point coordinates and the virtual scene is determined in the augmented reality scene, and the coordinate consistency model of natural hand and virtual model is established; finally, the real-time interaction between hand gesture and virtual model is realized, and the average correct rate of its hand gesture reaches 99.04%, which improves the robustness and real-time interaction of hand gesture recognition.

摘要

本文通过智能增强的多模态艺术姿态识别对人体艺术姿态进行了深入研究与分析。提出了一种基于运动能量的多模态信息互补网络模型架构。该网络利用RGB数据提供的丰富外观特征信息、深度数据提供的深度信息以及对亮度和观察角度的鲁棒性特征。多模态融合通过两种模态的互补信息特征来完成。此外,为了在考虑具有子动作共享现象的动作类别时更好地对长程时间结构进行建模,采用了一种能量引导的视频分割方法。并且在特征融合阶段,提出了一种跨模态交叉融合方法,该方法使卷积网络不仅能在浅层共享两种模态的局部特征,还能通过连接多个卷积层的特征图在深度卷积层获得全局特征的融合。首先,使用Kinect相机,借助Open pose开源框架获取人体的彩色图像数据、深度图像数据以及骨骼点的3D坐标数据。然后,基于手与头部之间的距离从关键帧中自动提取动作,并从关键帧中提取相对距离特征来描述动作,提取局部占用模式特征和HSV颜色空间特征来描述对象,最后进行特征融合并完成复杂的动作识别任务。为了解决虚拟现实融合的一致性问题,在增强现实场景中确定手部关节点坐标与虚拟场景之间的映射关系,建立自然手与虚拟模型的坐标一致性模型;最终实现手势与虚拟模型的实时交互,其手势平均正确率达到99.04%,提高了手势识别的鲁棒性和实时交互性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/437e/8606411/84fa62f259fc/fpsyg-12-769509-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/437e/8606411/5db5603c8dd9/fpsyg-12-769509-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/437e/8606411/cb299b9695ce/fpsyg-12-769509-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/437e/8606411/fe971d6846cf/fpsyg-12-769509-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/437e/8606411/efa67294b0d7/fpsyg-12-769509-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/437e/8606411/19f34b259f98/fpsyg-12-769509-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/437e/8606411/5089534379d9/fpsyg-12-769509-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/437e/8606411/b65a0b34280b/fpsyg-12-769509-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/437e/8606411/16ff1a6ad18e/fpsyg-12-769509-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/437e/8606411/84fa62f259fc/fpsyg-12-769509-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/437e/8606411/5db5603c8dd9/fpsyg-12-769509-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/437e/8606411/cb299b9695ce/fpsyg-12-769509-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/437e/8606411/fe971d6846cf/fpsyg-12-769509-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/437e/8606411/efa67294b0d7/fpsyg-12-769509-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/437e/8606411/19f34b259f98/fpsyg-12-769509-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/437e/8606411/5089534379d9/fpsyg-12-769509-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/437e/8606411/b65a0b34280b/fpsyg-12-769509-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/437e/8606411/16ff1a6ad18e/fpsyg-12-769509-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/437e/8606411/84fa62f259fc/fpsyg-12-769509-g009.jpg

相似文献

1
Multimodal Art Pose Recognition and Interaction With Human Intelligence Enhancement.多模态艺术姿态识别与人类智能增强交互
Front Psychol. 2021 Nov 8;12:769509. doi: 10.3389/fpsyg.2021.769509. eCollection 2021.
2
Multi-Scale Attention 3D Convolutional Network for Multimodal Gesture Recognition.用于多模态手势识别的多尺度注意力3D卷积网络
Sensors (Basel). 2022 Mar 21;22(6):2405. doi: 10.3390/s22062405.
3
Energy-Guided Temporal Segmentation Network for Multimodal Human Action Recognition.基于能量引导的多模态人体动作识别时间分割网络。
Sensors (Basel). 2020 Aug 19;20(17):4673. doi: 10.3390/s20174673.
4
Attentive 3D-Ghost Module for Dynamic Hand Gesture Recognition with Positive Knowledge Transfer.带正迁移知识的动态手势识别注意力 3D-Ghost 模块。
Comput Intell Neurosci. 2021 Nov 18;2021:5044916. doi: 10.1155/2021/5044916. eCollection 2021.
5
A Unified Deep Framework for Joint 3D Pose Estimation and Action Recognition from a Single RGB Camera.基于单目 RGB 相机的联合 3D 姿态估计和动作识别的统一深度框架。
Sensors (Basel). 2020 Mar 25;20(7):1825. doi: 10.3390/s20071825.
6
CrossFuNet: RGB and Depth Cross-Fusion Network for Hand Pose Estimation.CrossFuNet:用于手部姿势估计的 RGB 和深度交叉融合网络。
Sensors (Basel). 2021 Sep 11;21(18):6095. doi: 10.3390/s21186095.
7
Deep Dynamic Neural Networks for Multimodal Gesture Segmentation and Recognition.深度动态神经网络用于多模态手势分割与识别。
IEEE Trans Pattern Anal Mach Intell. 2016 Aug;38(8):1583-97. doi: 10.1109/TPAMI.2016.2537340. Epub 2016 Mar 2.
8
GMNet: Graded-Feature Multilabel-Learning Network for RGB-Thermal Urban Scene Semantic Segmentation.GMNet:用于RGB-热红外城市场景语义分割的分级特征多标签学习网络
IEEE Trans Image Process. 2021;30:7790-7802. doi: 10.1109/TIP.2021.3109518. Epub 2021 Sep 14.
9
Detection, segmentation, and 3D pose estimation of surgical tools using convolutional neural networks and algebraic geometry.使用卷积神经网络和代数几何进行手术工具的检测、分割和三维姿态估计。
Med Image Anal. 2021 May;70:101994. doi: 10.1016/j.media.2021.101994. Epub 2021 Feb 7.
10
MFA-Net: Motion Feature Augmented Network for Dynamic Hand Gesture Recognition from Skeletal Data.MFA-Net:基于运动特征增强的骨骼数据动态手势识别网络。
Sensors (Basel). 2019 Jan 10;19(2):239. doi: 10.3390/s19020239.

引用本文的文献

1
HAVIT: research on vision-language gesture interaction mechanism for smart furniture.哈维特:智能家具视觉-语言-手势交互机制研究
Sci Rep. 2025 Jul 28;15(1):27423. doi: 10.1038/s41598-025-10758-9.
2
Improved spatial-temporal graph convolutional networks for upper limb rehabilitation assessment based on precise posture measurement.基于精确姿势测量的用于上肢康复评估的改进型时空图卷积网络
Front Neurosci. 2023 Jul 11;17:1219556. doi: 10.3389/fnins.2023.1219556. eCollection 2023.
3
The Psychology Analysis for Post-production of College Students' Short Video Communication Education Based on Virtual Image and Internet of Things.

本文引用的文献

1
AI, visual imagery, and a case study on the challenges posed by human intelligence tests.人工智能、视觉意象,以及对人类智力测验所带来的挑战的案例研究。
Proc Natl Acad Sci U S A. 2020 Nov 24;117(47):29390-29397. doi: 10.1073/pnas.1912335117.
2
Multimodal affect analysis of psychodynamic play therapy.动力性游戏疗法的多模态情感分析。
Psychother Res. 2021 Mar;31(3):402-417. doi: 10.1080/10503307.2020.1839141. Epub 2020 Nov 5.
3
Touch? Speech? or Touch and Speech? Investigating Multimodal Interaction for Visual Network Exploration and Analysis.
基于虚拟图像与物联网的大学生短视频传播教育后期制作的心理学分析
Front Psychol. 2022 Mar 25;13:781802. doi: 10.3389/fpsyg.2022.781802. eCollection 2022.
触摸?语音?还是触摸和语音?探索视觉网络探索和分析的多模态交互。
IEEE Trans Vis Comput Graph. 2020 Jun;26(6):2168-2179. doi: 10.1109/TVCG.2020.2970512. Epub 2020 Jan 31.