Suppr超能文献

释放残差和双流变压器在遥感图像分析中的潜力。

Unleashing the Potential of Residual and Dual-Stream Transformers for the Remote Sensing Image Analysis.

作者信息

Mittal Priya, Tanwar Vishesh, Sharma Bhisham, Yadav Dhirendra Prasad

机构信息

Chitkara University Institute of Engineering and Technology, Chitkara University, Rajpura 140401, Punjab, India.

Centre of Research Impact and Outcome, Chitkara University, Rajpura 140401, Punjab, India.

出版信息

J Imaging. 2025 May 15;11(5):156. doi: 10.3390/jimaging11050156.

Abstract

The categorization of remote sensing satellite imagery is crucial for various applications, including environmental monitoring, urban planning, and disaster management. Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs) have exhibited exceptional performance among deep learning techniques, excelling in feature extraction and representational learning. This paper presents a hybrid dual-stream ResV2ViT model that combines the advantages of ResNet50 V2 and Vision Transformer (ViT) architectures. The dual-stream approach allows the model to extract both local spatial features and global contextual information by processing data through two complementary pathways. The ResNet50V2 component is utilized for hierarchical feature extraction and captures short-range dependencies, whereas the ViT module efficiently models long-range dependencies and global contextual information. After position embedding in the hybrid model, the tokens are bifurcated into two parts: q1 and q2. q1 is passed into the convolutional block to refine local spatial details, and q2 is given to the Transformer to provide global attention to the spatial feature. Combining these two architectures allows the model to acquire low-level and high-level feature representations, improving classification performance. We assess the proposed ResV2ViT model using the RSI-CB256 dataset and another dataset with 21 classes. The proposed model attains an average accuracy of 99.91%, with precision and F1 score of 99.90% for the first dataset and 98.75% accuracy for the second dataset, illustrating its efficacy in satellite image classification. The findings demonstrate that the dual-stream hybrid ResV2ViT model surpasses traditional CNN and Transformer-based models, establishing it as a formidable framework for remote sensing applications.

摘要

遥感卫星图像的分类对于包括环境监测、城市规划和灾害管理在内的各种应用至关重要。卷积神经网络(CNN)和视觉Transformer(ViT)在深度学习技术中表现出卓越的性能,在特征提取和表征学习方面表现出色。本文提出了一种混合双流ResV2ViT模型,该模型结合了ResNet50 V2和视觉Transformer(ViT)架构的优点。双流方法允许模型通过两条互补路径处理数据来提取局部空间特征和全局上下文信息。ResNet50V2组件用于分层特征提取并捕获短程依赖关系,而ViT模块则有效地对长程依赖关系和全局上下文信息进行建模。在混合模型中进行位置嵌入后,令牌被分为两部分:q1和q2。q1传入卷积块以细化局部空间细节,q2送入Transformer以对空间特征提供全局注意力。结合这两种架构使模型能够获得低级和高级特征表示,提高分类性能。我们使用RSI-CB256数据集和另一个包含21个类别的数据集对提出的ResV2ViT模型进行评估。对于第一个数据集,所提出的模型平均准确率达到99.91%,精确率和F1分数为99.90%,对于第二个数据集准确率为98.75%,说明了其在卫星图像分类中的有效性。研究结果表明,双流混合ResV2ViT模型优于传统的基于CNN和Transformer的模型,使其成为遥感应用的强大框架。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4d54/12112853/a17f53f72bfb/jimaging-11-00156-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验