Kohler Thomas, Batz Michel, Naderi Farzad, Kaup Andre, Maier Andreas, Riess Christian
IEEE Trans Pattern Anal Mach Intell. 2020 Nov;42(11):2944-2959. doi: 10.1109/TPAMI.2019.2917037. Epub 2019 May 16.
Capturing ground truth data to benchmark super-resolution (SR) is challenging. Therefore, current quantitative studies are mainly evaluated on simulated data artificially sampled from ground truth images. We argue that such evaluations overestimate the actual performance of SR methods compared to their behavior on real images. Toward bridging this simulated-to-real gap, we introduce the Super-Resolution Erlangen (SupER) database, the first comprehensive laboratory SR database of all-real acquisitions with pixel-wise ground truth. It consists of more than 80k images of 14 scenes combining different facets: CMOS sensor noise, real sampling at four resolution levels, nine scene motion types, two photometric conditions, and lossy video coding at five levels. As such, the database exceeds existing benchmarks by an order of magnitude in quality and quantity. This paper also benchmarks 19 popular single-image and multi-frame algorithms on our data. The benchmark comprises a quantitative study by exploiting ground truth data and qualitative evaluations in a large-scale observer study. We also rigorously investigate agreements between both evaluations from a statistical perspective. One interesting result is that top-performing methods on simulated data may be surpassed by others on real data. Our insights can spur further algorithm development, and the publicy available dataset can foster future evaluations.
获取用于超分辨率(SR)基准测试的真实数据具有挑战性。因此,当前的定量研究主要是在从真实图像中人工采样得到的模拟数据上进行评估。我们认为,与这些方法在真实图像上的表现相比,此类评估高估了超分辨率方法的实际性能。为了弥合这种模拟与真实之间的差距,我们引入了超分辨率埃尔朗根(SupER)数据库,这是首个包含逐像素真实标注的全真实采集的综合性实验室超分辨率数据库。它由14个场景的8万多张图像组成,这些图像结合了不同方面:CMOS传感器噪声、四个分辨率级别的真实采样、九种场景运动类型、两种光度条件以及五个级别的有损视频编码。因此,该数据库在质量和数量上比现有基准测试高出一个数量级。本文还在我们的数据上对19种流行的单图像和多帧算法进行了基准测试。该基准测试包括利用真实数据进行的定量研究以及大规模观察者研究中的定性评估。我们还从统计学角度严格研究了这两种评估之间的一致性。一个有趣的结果是,在模拟数据上表现最佳的方法在真实数据上可能会被其他方法超越。我们的见解可以推动进一步的算法开发,并且公开可用的数据集可以促进未来的评估。