Li Wenkai, Guo Qinghua
Guangdong Provincial Engineering Research Center for Remote Sensing and Monitoring of Water Environment School of Geography and Planning Sun Yat-Sen University Guangzhou China.
Institute of Ecology College of Urban and Environmental Sciences Peking University Beijing China.
Ecol Evol. 2021 Jul 1;11(15):10192-10206. doi: 10.1002/ece3.7826. eCollection 2021 Aug.
The receiver operating characteristic (ROC) and precision-recall (PR) plots have been widely used to evaluate the performance of species distribution models. Plotting the ROC/PR curves requires a traditional test set with both presence and absence data (namely PA approach), but species absence data are usually not available in reality. Plotting the ROC/PR curves from presence-only data while treating background data as pseudo absence data (namely PO approach) may provide misleading results.In this study, we propose a new approach to calibrate the ROC/PR curves from presence and background data with user-provided information on a constant , namely PB approach. Here, defines the probability that species occurrence is detected (labeled), and an estimate of can also be derived from the PB-based ROC/PR plots given that a model with good ability of discrimination is available. We used five virtual species and a real aerial photography to test the effectiveness of the proposed PB-based ROC/PR plots. Different models (or classifiers) were trained from presence and background data with various sample sizes. The ROC/PR curves plotted by PA approach were used to benchmark the curves plotted by PO and PB approaches.Experimental results show that the curves and areas under curves by PB approach are more similar to that by PA approach as compared with PO approach. The PB-based ROC/PR plots also provide highly accurate estimations of in our experiment.We conclude that the proposed PB-based ROC/PR plots can provide valuable complements to the existing model assessment methods, and they also provide an additional way to estimate the constant (or species prevalence) from presence and background data.
接收者操作特征(ROC)曲线和精确召回率(PR)曲线已被广泛用于评估物种分布模型的性能。绘制ROC/PR曲线需要一个同时包含存在和缺失数据的传统测试集(即PA方法),但在现实中物种缺失数据通常不可用。仅根据存在数据绘制ROC/PR曲线,同时将背景数据视为伪缺失数据(即PO方法)可能会产生误导性结果。在本研究中,我们提出了一种新方法,即利用用户提供的关于常数的信息,从存在数据和背景数据校准ROC/PR曲线,即PB方法。这里,定义了检测到(标记)物种出现的概率,并且如果有一个具有良好判别能力的模型,也可以从基于PB的ROC/PR图中得出的估计值。我们使用了五个虚拟物种和一张真实航空照片来测试所提出的基于PB的ROC/PR图的有效性。使用不同样本量的存在数据和背景数据训练不同的模型(或分类器)。用PA方法绘制的ROC/PR曲线用于作为PO和PB方法绘制曲线的基准。实验结果表明,与PO方法相比,PB方法绘制的曲线和曲线下面积与PA方法绘制的更相似。在我们的实验中,基于PB的ROC/PR图也提供了对的高度准确估计。我们得出结论,所提出的基于PB的ROC/PR图可以为现有模型评估方法提供有价值的补充,并且它们还提供了一种从存在数据和背景数据估计常数(或物种患病率)的额外方法。