Ramos Leo Thomas, Sappa Angel D
Computer Vision Center, Universitat Autònoma de Barcelona, Barcelona, 08193, Spain.
Kauel Inc., Menlo Park, Silicon Valley, CA, 94025, USA.
Sci Rep. 2025 Jan 4;15(1):784. doi: 10.1038/s41598-024-84795-1.
In this study, we explore an enhancement to the U-Net architecture by integrating SK-ResNeXt as the encoder for Land Cover Classification (LCC) tasks using Multispectral Imaging (MSI). SK-ResNeXt introduces cardinality and adaptive kernel sizes, allowing U-Net to better capture multi-scale features and adjust more effectively to variations in spatial resolution, thereby enhancing the model's ability to segment complex land cover types. We evaluate this approach using the Five-Billion-Pixels dataset, composed of 150 large-scale RGB-NIR images and over 5 billion labeled pixels across 24 categories. The approach achieves notable improvements over the baseline U-Net, with gains of 5.312% in Overall Accuracy (OA) and 8.906% in mean Intersection over Union (mIoU) when using the RGB configuration. With the RG-NIR configuration, these improvements increase to 6.928% in OA and 6.938% in mIoU, while the RGB-NIR configuration yields gains of 5.854% in OA and 7.794% in mIoU. Furthermore, the approach not only outperforms other well-established models such as DeepLabV3, DeepLabV3+, Ma-Net, SegFormer, and PSPNet, particularly with the RGB-NIR configuration, but also surpasses recent state-of-the-art methods. Visual tests confirmed this superiority, showing that the studied approach achieves notable improvements in certain classes, such as lakes, rivers, industrial areas, residential areas, and vegetation, where the other architectures struggled to achieve accurate segmentation. These results demonstrate the potential and capability of the explored approach to effectively handle MSI and enhance LCC results.
在本研究中,我们探索了对U-Net架构的一种改进,即集成SK-ResNeXt作为使用多光谱成像(MSI)进行土地覆盖分类(LCC)任务的编码器。SK-ResNeXt引入了基数和自适应内核大小,使U-Net能够更好地捕捉多尺度特征,并更有效地适应空间分辨率的变化,从而增强模型分割复杂土地覆盖类型的能力。我们使用由150幅大规模RGB-NIR图像和超过50亿个涵盖24个类别的标记像素组成的五十亿像素数据集来评估这种方法。与基线U-Net相比,该方法取得了显著改进,在使用RGB配置时,总体准确率(OA)提高了5.312%,平均交并比(mIoU)提高了8.906%。使用RG-NIR配置时,OA的改进增加到6.928%,mIoU的改进增加到6.938%,而RGB-NIR配置在OA方面提高了5.854%,在mIoU方面提高了7.794%。此外,该方法不仅优于其他成熟模型,如DeepLabV3、DeepLabV3+、Ma-Net、SegFormer和PSPNet,特别是在RGB-NIR配置下,而且还超越了最近的先进方法。视觉测试证实了这种优越性,表明所研究的方法在某些类别中取得了显著改进,如湖泊、河流、工业区、居民区和植被,而其他架构在这些类别中难以实现准确分割。这些结果证明了所探索方法在有效处理MSI和增强LCC结果方面的潜力和能力。