• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于人群分析与分类的视听多模态驱动混合特征学习模型

Audio-visual multi-modality driven hybrid feature learning model for crowd analysis and classification.

作者信息

Swathi H Y, Shivakumar G

机构信息

Department of Electronics and Communication Engineering, Malnad College of Engineering, Visvesvaraya Technological University, Belagavi, India.

Department of Electronics and Communication Engineering, AMC Engineering College, Visvesvaraya Technological University, Belagavi, India.

出版信息

Math Biosci Eng. 2023 May 25;20(7):12529-12561. doi: 10.3934/mbe.2023558.

DOI:10.3934/mbe.2023558
PMID:37501454
Abstract

The high pace emergence in advanced software systems, low-cost hardware and decentralized cloud computing technologies have broadened the horizon for vision-based surveillance, monitoring and control. However, complex and inferior feature learning over visual artefacts or video streams, especially under extreme conditions confine majority of the at-hand vision-based crowd analysis and classification systems. Retrieving event-sensitive or crowd-type sensitive spatio-temporal features for the different crowd types under extreme conditions is a highly complex task. Consequently, it results in lower accuracy and hence low reliability that confines existing methods for real-time crowd analysis. Despite numerous efforts in vision-based approaches, the lack of acoustic cues often creates ambiguity in crowd classification. On the other hand, the strategic amalgamation of audio-visual features can enable accurate and reliable crowd analysis and classification. Considering it as motivation, in this research a novel audio-visual multi-modality driven hybrid feature learning model is developed for crowd analysis and classification. In this work, a hybrid feature extraction model was applied to extract deep spatio-temporal features by using Gray-Level Co-occurrence Metrics (GLCM) and AlexNet transferrable learning model. Once extracting the different GLCM features and AlexNet deep features, horizontal concatenation was done to fuse the different feature sets. Similarly, for acoustic feature extraction, the audio samples (from the input video) were processed for static (fixed size) sampling, pre-emphasis, block framing and Hann windowing, followed by acoustic feature extraction like GTCC, GTCC-Delta, GTCC-Delta-Delta, MFCC, Spectral Entropy, Spectral Flux, Spectral Slope and Harmonics to Noise Ratio (HNR). Finally, the extracted audio-visual features were fused to yield a composite multi-modal feature set, which is processed for classification using the random forest ensemble classifier. The multi-class classification yields a crowd-classification accurac12529y of (98.26%), precision (98.89%), sensitivity (94.82%), specificity (95.57%), and F-Measure of 98.84%. The robustness of the proposed multi-modality-based crowd analysis model confirms its suitability towards real-world crowd detection and classification tasks.

摘要

先进软件系统、低成本硬件和去中心化云计算技术的快速涌现,拓宽了基于视觉的监控、监测和控制的视野。然而,在视觉伪像或视频流上进行复杂且劣质的特征学习,尤其是在极端条件下,限制了大多数现有的基于视觉的人群分析和分类系统。在极端条件下为不同人群类型检索事件敏感或人群类型敏感的时空特征是一项高度复杂的任务。因此,这会导致准确率较低,进而可靠性较低,限制了现有的实时人群分析方法。尽管在基于视觉的方法上付出了诸多努力,但缺乏声学线索往往会在人群分类中造成模糊性。另一方面,视听特征的策略性融合可以实现准确可靠的人群分析和分类。基于此动机,本研究开发了一种新颖的视听多模态驱动的混合特征学习模型用于人群分析和分类。在这项工作中,应用了一种混合特征提取模型,通过使用灰度共生矩阵(GLCM)和AlexNet可迁移学习模型来提取深度时空特征。一旦提取出不同的GLCM特征和AlexNet深度特征,就进行水平拼接以融合不同的特征集。同样,对于声学特征提取,对音频样本(来自输入视频)进行静态(固定大小)采样、预加重、分块加窗和汉宁窗处理,随后进行声学特征提取,如GTCC、GTCC - 增量、GTCC - 增量 - 增量、MFCC、谱熵、谱通量、谱斜率和谐噪比(HNR)。最后,将提取的视听特征进行融合,得到一个复合多模态特征集,使用随机森林集成分类器对其进行分类处理。多类分类的人群分类准确率为98.26%,精确率为98.89%,敏感度为94.82%,特异度为95.57%,F值为98.84%。所提出的基于多模态的人群分析模型的鲁棒性证实了其适用于现实世界的人群检测和分类任务。

相似文献

1
Audio-visual multi-modality driven hybrid feature learning model for crowd analysis and classification.用于人群分析与分类的视听多模态驱动混合特征学习模型
Math Biosci Eng. 2023 May 25;20(7):12529-12561. doi: 10.3934/mbe.2023558.
2
On effective cognitive state classification using novel feature extraction strategies.关于使用新型特征提取策略进行有效的认知状态分类
Cogn Neurodyn. 2021 Dec;15(6):1125-1155. doi: 10.1007/s11571-021-09688-9. Epub 2021 Jun 22.
3
Hybrid and Deep Learning Approach for Early Diagnosis of Lower Gastrointestinal Diseases.混合与深度学习方法在胃肠道疾病早期诊断中的应用
Sensors (Basel). 2022 May 27;22(11):4079. doi: 10.3390/s22114079.
4
Multi-Person Tracking and Crowd Behavior Detection via Particles Gradient Motion Descriptor and Improved Entropy Classifier.基于粒子梯度运动描述符和改进熵分类器的多人跟踪与人群行为检测
Entropy (Basel). 2021 May 18;23(5):628. doi: 10.3390/e23050628.
5
Multilevel hybrid handcrafted feature extraction based depression recognition method using speech.基于语音的多级混合手工特征提取抑郁症识别方法。
J Affect Disord. 2024 Nov 1;364:9-19. doi: 10.1016/j.jad.2024.08.002. Epub 2024 Aug 9.
6
Multi-Modal Residual Perceptron Network for Audio-Video Emotion Recognition.多模态残差感知机网络的音视频情感识别。
Sensors (Basel). 2021 Aug 12;21(16):5452. doi: 10.3390/s21165452.
7
DCNN for Pig Vocalization and Non-Vocalization Classification: Evaluate Model Robustness with New Data.用于猪发声与非发声分类的深度卷积神经网络:使用新数据评估模型稳健性
Animals (Basel). 2024 Jul 9;14(14):2029. doi: 10.3390/ani14142029.
8
High-Level CNN and Machine Learning Methods for Speaker Recognition.基于深度学习的说话人识别方法。
Sensors (Basel). 2023 Mar 25;23(7):3461. doi: 10.3390/s23073461.
9
Statistical and Visual Analysis of Audio, Text, and Image Features for Multi-Modal Music Genre Recognition.用于多模态音乐流派识别的音频、文本和图像特征的统计与可视化分析
Entropy (Basel). 2021 Nov 12;23(11):1502. doi: 10.3390/e23111502.
10
Deep feature classification of angiomyolipoma without visible fat and renal cell carcinoma in abdominal contrast-enhanced CT images with texture image patches and hand-crafted feature concatenation.利用纹理图像补丁和手工特征串联对腹部增强 CT 图像中无可见脂肪的血管平滑肌脂肪瘤和肾细胞癌进行深度特征分类。
Med Phys. 2018 Apr;45(4):1550-1561. doi: 10.1002/mp.12828. Epub 2018 Mar 25.