Department of Electrical, Electronic and Systems Engineering, Faculty of Engineering and Built Environment, Universiti Kebangsaan Malaysia, Bangi 43600, Selangor, Malaysia.
Department of Electrical and Computer Engineering, Faculty of Engineering, Universitas Syiah Kuala, Kopelma Darussalam 23111, Indonesia.
Sensors (Basel). 2022 Sep 28;22(19):7384. doi: 10.3390/s22197384.
In general, most of the existing convolutional neural network (CNN)-based deep-learning models suffer from spatial-information loss and inadequate feature-representation issues. This is due to their inability to capture multiscale-context information and the exclusion of semantic information throughout the pooling operations. In the early layers of a CNN, the network encodes simple semantic representations, such as edges and corners, while, in the latter part of the CNN, the network encodes more complex semantic features, such as complex geometric shapes. Theoretically, it is better for a CNN to extract features from different levels of semantic representation because tasks such as classification and segmentation work better when both simple and complex feature maps are utilized. Hence, it is also crucial to embed multiscale capability throughout the network so that the various scales of the features can be optimally captured to represent the intended task. Multiscale representation enables the network to fuse low-level and high-level features from a restricted receptive field to enhance the deep-model performance. The main novelty of this review is the comprehensive novel taxonomy of multiscale-deep-learning methods, which includes details of several architectures and their strengths that have been implemented in the existing works. Predominantly, multiscale approaches in deep-learning networks can be classed into two categories: multiscale feature learning and multiscale feature fusion. Multiscale feature learning refers to the method of deriving feature maps by examining kernels over several sizes to collect a larger range of relevant features and predict the input images' spatial mapping. Multiscale feature fusion uses features with different resolutions to find patterns over short and long distances, without a deep network. Additionally, several examples of the techniques are also discussed according to their applications in satellite imagery, medical imaging, agriculture, and industrial and manufacturing systems.
总的来说,大多数现有的基于卷积神经网络(CNN)的深度学习模型都存在空间信息丢失和特征表示不足的问题。这是由于它们无法捕获多尺度上下文信息,并在池化操作中排除语义信息。在 CNN 的早期层中,网络编码简单的语义表示,如边缘和拐角,而在 CNN 的后期部分,网络编码更复杂的语义特征,如复杂的几何形状。从理论上讲,CNN 从不同的语义表示层次提取特征更好,因为分类和分割等任务在利用简单和复杂特征图时效果更好。因此,在整个网络中嵌入多尺度能力也很关键,以便能够最佳地捕获各种尺度的特征,以表示预期的任务。多尺度表示使网络能够融合来自受限感受野的低水平和高水平特征,从而提高深度模型的性能。本篇综述的主要新颖之处在于对多尺度深度学习方法进行了全面的新分类,其中包括了现有工作中实施的几种架构及其优势的详细信息。主要地,深度学习网络中的多尺度方法可以分为两类:多尺度特征学习和多尺度特征融合。多尺度特征学习是指通过检查几个大小的核来导出特征图的方法,以收集更大范围的相关特征,并预测输入图像的空间映射。多尺度特征融合使用具有不同分辨率的特征来寻找短距离和长距离的模式,而无需深度网络。此外,还根据它们在卫星图像、医学成像、农业以及工业和制造系统中的应用讨论了几种技术示例。