Abriha Dávid, Srivastava Prashant K, Szabó Szilárd
Department of Physical Geography and Geoinformatics, Faculty of Science and Technology, Institute of Geosciences, University of Debrecen, Egyetem tér 1, Debrecen, 4032, Hungary.
Remote Sensing Laboratory, Institute of Environment and Sustainable Development, Banaras Hindu University, Varanasi, 221005, India.
Heliyon. 2023 Feb 24;9(3):e14045. doi: 10.1016/j.heliyon.2023.e14045. eCollection 2023 Mar.
Deriving the thematic accuracy of models is a fundamental part of image classification analyses. K-fold cross-validation (KCV), as an accuracy assessment technique, can be biased because existing built-in algorithms of software solutions do not handle the high autocorrelation of remotely sensed images, leading to overestimation of accuracies. We aimed to quantify the magnitude of the overestimation of KCV-based accuracies and propose a method to overcome this problem with the example of rooftops using a WorldView-2 (WV2) satellite image, and two orthophotos. Random split to training/testing subsets, independent testing and different types of repeated KCV sampling strategies were used to generate input datasets for classification. Results revealed that applying the random splitting of reference data to training/testing subsets and KCV methods had significantly biased the accuracies by up to 17%; overall accuracies (OAs) can incorrectly reach >99%. We found that repeated KCV can provide similar results to independent testing when spatial sampling is applied with a sufficiently large distance threshold (in our case 10 m). Coarser resolution of WV2 ensured more reliable results (up to a 5-9% increase in OA) than orthophotos. Object-based pixel purity of buildings showed that when using a majority filter for at least of 50% of objects the final accuracy approached 100% with each sampling method. The final conclusion is that KCV-based modelling ensures better accuracy than single models (with better pixel purity on the object level), but the accuracy metrics without spatially filtered sampling are not reliable.
获取模型的主题准确性是图像分类分析的一个基本部分。K折交叉验证(KCV)作为一种准确性评估技术,可能存在偏差,因为软件解决方案中现有的内置算法无法处理遥感图像的高自相关性,导致准确性被高估。我们旨在量化基于KCV的准确性高估程度,并以使用WorldView-2(WV2)卫星图像和两张正射影像的屋顶为例,提出一种克服此问题的方法。使用随机分割训练/测试子集、独立测试和不同类型的重复KCV采样策略来生成用于分类的输入数据集。结果表明,将参考数据随机分割到训练/测试子集以及使用KCV方法会使准确性产生显著偏差,高达17%;总体准确性(OAs)可能会错误地达到>99%。我们发现,当应用足够大的距离阈值(在我们的案例中为10米)进行空间采样时,重复KCV可以提供与独立测试相似的结果。WV2的分辨率较低,与正射影像相比,能确保更可靠的结果(OA提高5 - 9%)。基于对象的建筑物像素纯度表明,当对至少50%的对象使用多数滤波器时,每种采样方法的最终准确性接近100%。最终结论是,基于KCV的建模比单个模型能确保更高的准确性(在对象层面具有更好的像素纯度),但未经空间滤波采样的准确性指标不可靠。