Vásquez-Salazar Rubén Darío, Cardona-Mesa Ahmed Alejandro, Gómez Luis, Travieso-González Carlos M, Garavito-González Andrés F, Vásquez-Cano Esteban
Faculty of Engineering, Politécnico Colombiano Jaime Isaza Cadavid, Medellín, 48th Av, 7-151, Colombia.
Faculty of Engineering, Institución Universitaria Digital de Antioquia, Medellín, 55th Av, 42-90, Colombia.
Data Brief. 2024 Jan 15;53:110065. doi: 10.1016/j.dib.2024.110065. eCollection 2024 Apr.
When training Artificial Intelligence and Deep Learning models, especially by using Supervised Learning techniques, a labeled dataset is required to have an input with data and its corresponding labeled output data. In the case of images, for classification, segmentation, or other processing tasks, a pair of images is required in the same sense, one image as an input (the noisy image) and the desired (the denoised image) one as an output. For SAR despeckling applications, the common approach is to have a set of optical images that then are corrupted with synthetic noise, since there is no ground truth available. The corrupted image is considered the input and the optical one is the noiseless one (ground truth). In this paper, we provide a dataset based on actual SAR images. The ground truth was obtained from SAR images of Sentinel 1 of the same region in different instants of time and then they were processed and merged into one single image that serves as the output of the dataset. Every SAR image (noisy and ground truth) was split into 1600 images of 512 × 512 pixels, so a total of 3200 images were obtained. The dataset was also split into 3000 for training and 200 for validation, all of them available in four labeled folders.
在训练人工智能和深度学习模型时,尤其是使用监督学习技术时,需要一个带标签的数据集,其中包含数据输入及其相应的带标签输出数据。对于图像而言,在进行分类、分割或其他处理任务时,同样需要一对图像,一幅图像作为输入(噪声图像),另一幅期望的图像(去噪图像)作为输出。对于合成孔径雷达(SAR)去斑应用,常见的方法是使用一组光学图像,然后用合成噪声对其进行损坏,因为没有可用的地面真值。损坏后的图像被视为输入,而光学图像则是无噪声的图像(地面真值)。在本文中,我们提供了一个基于实际SAR图像的数据集。地面真值是从同一区域不同时刻的哨兵1号SAR图像中获取的,然后对其进行处理并合并成一幅单独的图像,作为数据集的输出。每幅SAR图像(噪声图像和地面真值图像)都被分割成1600幅512×512像素的图像,因此总共获得了3200幅图像。该数据集也被分成3000幅用于训练,200幅用于验证,所有这些图像都存放在四个带标签的文件夹中。