用于自适应视频摘要的行为分析：从通用化到个性化

Behavioral profiling for adaptive video summarization: From generalization to personalization.

作者信息

Kadam Payal, Vora Deepali, Patil Shruti, Mishra Sashikala, Khairnar Vaishali

机构信息

Symbiosis Institute of Technology, Pune Campus, Symbiosis International (Deemed University) (SIU), Lavale, Pune, Maharashtra, India.

Bharati Vidyapeeth (Deemed to be University) College of Engineering, Pune, India.

出版信息

MethodsX. 2024 Jun 14;13:102780. doi: 10.1016/j.mex.2024.102780. eCollection 2024 Dec.

DOI:10.1016/j.mex.2024.102780

PMID:39007030

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11239710/

Abstract

In today's world of managing multimedia content, dealing with the amount of CCTV footage poses challenges related to storage, accessibility and efficient navigation. To tackle these issues, we suggest an encompassing technique, for summarizing videos that merges machine-learning techniques with user engagement. Our methodology consists of two phases, each bringing improvements to video summarization. In Phase I we introduce a method for summarizing videos based on keyframe detection and behavioral analysis. By utilizing technologies like YOLOv5 for object recognition, Deep SORT for object tracking, and Single Shot Detector (SSD) for creating video summaries. In Phase II we present a User Interest Based Video summarization system driven by machine learning. By incorporating user preferences into the summarization process we enhance techniques with personalized content curation. Leveraging tools such as NLTK, OpenCV, TensorFlow, and the EfficientDET model enables our system to generate customized video summaries tailored to preferences. This innovative approach not only enhances user interactions but also efficiently handles the overwhelming amount of video data on digital platforms. By combining these two methodologies we make progress in applying machine learning techniques while offering a solution to the complex challenges presented by managing multimedia data.

摘要

在当今管理多媒体内容的世界中，处理大量的闭路电视（CCTV） footage 对存储、可访问性和高效导航提出了挑战。为了解决这些问题，我们提出了一种全面的技术，用于将机器学习技术与用户参与度相结合来总结视频。我们的方法包括两个阶段，每个阶段都对视频总结有所改进。在第一阶段，我们介绍了一种基于关键帧检测和行为分析的视频总结方法。通过利用诸如 YOLOv5 进行目标识别、Deep SORT 进行目标跟踪以及单阶段检测器（SSD）来创建视频摘要等技术。在第二阶段，我们提出了一种由机器学习驱动的基于用户兴趣的视频总结系统。通过将用户偏好纳入总结过程，我们通过个性化内容策划来增强技术。利用诸如 NLTK、OpenCV、TensorFlow 和 EfficientDET 模型等工具，使我们的系统能够生成根据偏好定制的视频摘要。这种创新方法不仅增强了用户交互，还有效地处理了数字平台上大量的视频数据。通过结合这两种方法，我们在应用机器学习技术方面取得了进展，同时为管理多媒体数据所带来的复杂挑战提供了一个解决方案。