Department of Computer Engineering, Devang Patel Institute of Advance Technology and Research (DEPSTAR), Faculty of Technology and Engineering (FTE), CHARUSAT Campus, Charotar University of Science and Technology (CHARUSAT), Changa 388421, India.
Parul University, Vadodara 382030, Gujarat, India.
Sensors (Basel). 2022 Feb 24;22(5):1780. doi: 10.3390/s22051780.
The object recognition concept is being widely used a result of increasing CCTV surveillance and the need for automatic object or activity detection from images or video. Increases in the use of various sensor networks have also raised the need of lightweight process frameworks. Much research has been carried out in this area, but the research scope is colossal as it deals with open-ended problems such as being able to achieve high accuracy in little time using lightweight process frameworks. Convolution Neural Networks and their variants are widely used in various computer vision activities, but most of the architectures of CNN are application-specific. There is always a need for generic architectures with better performance. This paper introduces the Dimension-Based Generic Convolution Block (DBGC), which can be used with any CNN to make the architecture generic and provide a dimension-wise selection of various height, width, and depth kernels. This single unit which uses the separable convolution concept provides multiple combinations using various dimension-based kernels. This single unit can be used for height-based, width-based, or depth-based dimensions; the same unit can even be used for height and width, width and depth, and depth and height dimensions. It can also be used for combinations involving all three dimensions of height, width, and depth. The main novelty of DBGC lies in the dimension selector block included in the proposed architecture. Proposed unoptimized kernel dimensions reduce FLOPs by around one third and also reduce the accuracy by around one half; semi-optimized kernel dimensions yield almost the same or higher accuracy with half the FLOPs of the original architecture, while optimized kernel dimensions provide 5 to 6% higher accuracy with around a 10 M reduction in FLOPs.
由于闭路电视监控的增加以及需要从图像或视频中自动检测目标或活动,目标识别的概念得到了广泛应用。各种传感器网络的使用增加也提高了对轻量级处理框架的需求。在这一领域已经进行了大量的研究,但研究范围是巨大的,因为它涉及到一些开放性问题,例如如何在使用轻量级处理框架的情况下,在短时间内实现高精度。卷积神经网络及其变体在各种计算机视觉活动中得到了广泛的应用,但大多数 CNN 架构都是特定于应用的。总是需要具有更好性能的通用架构。本文介绍了基于维度的通用卷积块(DBGC),它可以与任何 CNN 一起使用,使架构通用,并提供各种高度、宽度和深度内核的维度选择。这个使用可分离卷积概念的单个单元使用各种基于维度的内核提供了多种组合。这个单个单元可以用于基于高度、基于宽度或基于深度的维度;同一个单元甚至可以用于高度和宽度、宽度和深度以及深度和高度维度。它还可以用于涉及高度、宽度和深度这三个维度的组合。DBGC 的主要新颖之处在于所提出的架构中包含的维度选择器块。所提出的未优化内核维度将 FLOPs 减少了约三分之一,并且将准确性降低了约一半;半优化内核维度的 FLOPs 几乎与原始架构相同或更高,而优化内核维度的 FLOPs 减少了约 10M,但准确性提高了 5%至 6%。