Ghozia Ahmed, Attiya Gamal, Adly Emad, El-Fishawy Nawal
Computer Science and Engineering Department, Faculty of Electronic Engineering, Menoufia University, Shibin El Kom, Menofia Governorate, Egypt.
Comput Intell Neurosci. 2020 Dec 23;2020:8813089. doi: 10.1155/2020/8813089. eCollection 2020.
Understanding video files is a challenging task. While the current video understanding techniques rely on deep learning, the obtained results suffer from a lack of real trustful meaning. Deep learning recognizes patterns from big data, leading to deep feature abstraction, not deep understanding. Deep learning tries to understand multimedia production by analyzing its content. We cannot understand the semantics of a multimedia file by analyzing its content only. Events occurring in a scene earn their meanings from the context containing them. A screaming kid could be scared of a threat or surprised by a lovely gift or just playing in the backyard. Artificial intelligence is a heterogeneous process that goes beyond learning. In this article, we discuss the heterogeneity of AI as a process that includes innate knowledge, approximations, and context awareness. We present a context-aware video understanding technique that makes the machine intelligent enough to understand the message behind the video stream. The main purpose is to understand the video stream by extracting real meaningful concepts, emotions, temporal data, and spatial data from the video context. The diffusion of heterogeneous data patterns from the video context leads to accurate decision-making about the video message and outperforms systems that rely on deep learning. Objective and subjective comparisons prove the accuracy of the concepts extracted by the proposed context-aware technique in comparison with the current deep learning video understanding techniques. Both systems are compared in terms of retrieval time, computing time, data size consumption, and complexity analysis. Comparisons show a significant efficient resource usage of the proposed context-aware system, which makes it a suitable solution for real-time scenarios. Moreover, we discuss the pros and cons of deep learning architectures.
理解视频文件是一项具有挑战性的任务。虽然当前的视频理解技术依赖于深度学习,但所获得的结果缺乏真正可信的意义。深度学习从大数据中识别模式,导致深度特征抽象,而非深度理解。深度学习试图通过分析多媒体内容来理解多媒体制作。仅通过分析多媒体文件的内容,我们无法理解其语义。场景中发生的事件从包含它们的上下文环境中获得其意义。一个尖叫的孩子可能是受到了威胁的惊吓,或者是被一份可爱的礼物惊喜到,又或者只是在后院玩耍。人工智能是一个超越学习的异构过程。在本文中,我们将讨论人工智能作为一个包含先天知识、近似值和上下文感知的过程的异构性。我们提出了一种上下文感知视频理解技术,使机器足够智能,能够理解视频流背后的信息。主要目的是通过从视频上下文中提取真正有意义的概念、情感、时间数据和空间数据来理解视频流。来自视频上下文的异构数据模式的扩散导致对视频信息的准确决策,并且优于依赖深度学习的系统。客观和主观比较证明了与当前深度学习视频理解技术相比,所提出的上下文感知技术提取的概念的准确性。在检索时间、计算时间、数据大小消耗和复杂性分析方面对这两种系统进行了比较。比较结果表明,所提出的上下文感知系统具有显著高效的资源使用情况,这使其成为实时场景的合适解决方案。此外,我们还讨论了深度学习架构的优缺点。