Low Pei Jing, Ng Bo Yan, Mahzan Nur Insyirah, Tian Jing, Leung Cheung-Chi
NUS-ISS, National University of Singapore, Singapore 119615, Singapore.
Sensors (Basel). 2025 Jan 4;25(1):255. doi: 10.3390/s25010255.
Recognizing the action of plastic bag taking from CCTV video footage represents a highly specialized and niche challenge within the broader domain of action video classification. To address this challenge, our paper introduces a novel benchmark video dataset specifically curated for the task of identifying the action of grabbing a plastic bag. Additionally, we propose and evaluate three distinct baseline approaches. The first approach employs a combination of handcrafted feature extraction techniques and a sequential classification model to analyze motion and object-related features. The second approach leverages a multiple-frame (CNN) to exploit temporal and spatial patterns in the video data. The third approach explores a 3D CNN-based deep learning model, which is capable of processing video data as volumetric inputs. To assess the performance of these methods, we conduct a comprehensive comparative study, demonstrating the strengths and limitations of each approach within this specialized domain.
从央视视频片段中识别拿取塑料袋的动作,在更广泛的动作视频分类领域中是一项高度专业化且细分的挑战。为应对这一挑战,我们的论文引入了一个专门为识别抓取塑料袋动作任务精心策划的新型基准视频数据集。此外,我们提出并评估了三种不同的基线方法。第一种方法采用手工特征提取技术和序列分类模型的组合来分析运动和与物体相关的特征。第二种方法利用多帧卷积神经网络(CNN)来挖掘视频数据中的时空模式。第三种方法探索基于3D CNN的深度学习模型,该模型能够将视频数据作为体数据输入进行处理。为评估这些方法的性能,我们进行了全面的比较研究,展示了每种方法在这个专业领域的优势和局限性。