评估有无模型的预测性能：为何同一模型会表现出优劣不同？

Evaluating the predictive performance of presence-absence models: Why can the same model appear excellent or poor?

作者信息

Abrego Nerea, Ovaskainen Otso

机构信息

Department of Biological and Environmental Science University of Jyväskylä Jyväskylä Finland.

Department of Agricultural Sciences University of Helsinki Helsinki Finland.

出版信息

Ecol Evol. 2023 Dec 18;13(12):e10784. doi: 10.1002/ece3.10784. eCollection 2023 Dec.

DOI:10.1002/ece3.10784

PMID:38111919

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10726276/

Abstract

When comparing multiple models of species distribution, models yielding higher predictive performance are clearly to be favored. A more difficult question is how to decide whether even the best model is "good enough". Here, we clarify key choices and metrics related to evaluating the predictive performance of presence-absence models. We use a hierarchical case study to evaluate how four metrics of predictive performance (AUC, Tjur's , max-Kappa, and max-TSS) relate to each other, the random and fixed effects parts of the model, the spatial scale at which predictive performance is measured, and the cross-validation strategy chosen. We demonstrate that the very same metric can achieve different values for the very same model, even when similar cross-validation strategies are followed, depending on the spatial scale at which predictive performance is measured. Among metrics, Tjur's and max-Kappa generally increase with species' prevalence, whereas AUC and max-TSS are largely independent of prevalence. Thus, Tjur's and max-Kappa often reach lower values when measured at the smallest scales considered in the study, while AUC and max-TSS reaching similar values across the different spatial levels included in the study. However, they provide complementary insights on predictive performance. The very same model may appear excellent or poor not only due to the applied metric, but also how predictive performance is exactly calculated, calling for great caution on the interpretation of predictive performance. The most comprehensive evaluation of predictive performance can be obtained by evaluating predictive performance through the combination of measures providing complementary insights. Instead of following simple rules of thumb or focusing on absolute values, we recommend comparing the achieved predictive performance to the researcher's own a priori expectations on how easy it is to make predictions related to the same question that the model is used for.

摘要

在比较多种物种分布模型时，显然应青睐具有更高预测性能的模型。一个更棘手的问题是如何确定即使是最好的模型是否“足够好”。在此，我们阐明了与评估有无模型预测性能相关的关键选择和指标。我们使用分层案例研究来评估预测性能的四个指标（AUC、Tjur's 、最大Kappa值和最大TSS）如何相互关联，模型的随机效应和固定效应部分，测量预测性能的空间尺度，以及所选择的交叉验证策略。我们证明，即使遵循相似的交叉验证策略，对于同一个模型，根据测量预测性能的空间尺度不同，同一个指标也可能得到不同的值。在这些指标中，Tjur's 和最大Kappa值通常会随着物种的发生率而增加，而AUC和最大TSS在很大程度上与发生率无关。因此，在研究中考虑的最小尺度上进行测量时，Tjur's 和最大Kappa值通常会达到较低的值，而AUC和最大TSS在研究中包含的不同空间水平上达到相似的值。然而，它们提供了关于预测性能的互补见解。同一个模型可能看起来很棒或很差，不仅是由于所应用的指标，还取决于预测性能的确切计算方式，这就要求在解释预测性能时要格外谨慎。通过结合提供互补见解的措施来评估预测性能，可以获得对预测性能最全面的评估。我们建议不要遵循简单的经验法则或关注绝对值，而是将所实现的预测性能与研究人员自己对就模型所用于的相同问题进行预测的难易程度的先验期望进行比较。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/84ec/10726276/81625193c8e2/ECE3-13-e10784-g001.jpg

相似文献

Evaluating the predictive performance of presence-absence models: Why can the same model appear excellent or poor?

Ecol Evol. 2023 Dec 18;13(12):e10784. doi: 10.1002/ece3.10784. eCollection 2023 Dec.

Summary Measures of Predictive Power Associated with Logistic Regression Models of Disease Risk.

Phytopathology. 2019 May;109(5):712-715. doi: 10.1094/PHYTO-09-18-0356-LE. Epub 2019 Apr 2.

Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.

Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.

Comparison of Two Modern Survival Prediction Tools, SORG-MLA and METSSS, in Patients With Symptomatic Long-bone Metastases Who Underwent Local Treatment With Surgery Followed by Radiotherapy and With Radiotherapy Alone.

Clin Orthop Relat Res. 2024 Dec 1;482(12):2193-2208. doi: 10.1097/CORR.0000000000003185. Epub 2024 Jul 23.

Qualitative Study

Tjur's R for logistic regression models is the same as Youden's index for a 2×2 diagnostic table.

Ann Epidemiol. 2017 Dec;27(12):801-802. doi: 10.1016/j.annepidem.2017.10.009. Epub 2017 Oct 20.

Predicting the risk of infant mortality for newborns operated for congenital heart defects: A population-based cohort (EPICARD) study of two post-operative predictive scores.

Health Sci Rep. 2021 May 19;4(2):e300. doi: 10.1002/hsr2.300. eCollection 2021 Jun.

Embryologist agreement when assessing blastocyst implantation probability: is data-driven prediction the solution to embryo assessment subjectivity?

Hum Reprod. 2022 Sep 30;37(10):2275-2290. doi: 10.1093/humrep/deac171.

Prevalence dependence in model goodness measures with special emphasis on true skill statistics.

Ecol Evol. 2017 Jan 12;7(3):863-872. doi: 10.1002/ece3.2654. eCollection 2017 Feb.

Are Current Survival Prediction Tools Useful When Treating Subsequent Skeletal-related Events From Bone Metastases?

Clin Orthop Relat Res. 2024 Sep 1;482(9):1710-1721. doi: 10.1097/CORR.0000000000003030. Epub 2024 Mar 22.

引用本文的文献

The overlapping global distribution of dengue, chikungunya, Zika and yellow fever.

Nat Commun. 2025 Apr 10;16(1):3418. doi: 10.1038/s41467-025-58609-5.

The role of stochasticity in fungal community assembly: explaining apparent stochasticity with field experiments.

Proc Biol Sci. 2025 Feb;292(2040):20242416. doi: 10.1098/rspb.2024.2416. Epub 2025 Feb 5.

Airborne DNA reveals predictable spatial and seasonal dynamics of fungi.

Nature. 2024 Jul;631(8022):835-842. doi: 10.1038/s41586-024-07658-9. Epub 2024 Jul 10.

本文引用的文献

Joint species distribution modelling with the r-package Hmsc.

Methods Ecol Evol. 2020 Mar;11(3):442-447. doi: 10.1111/2041-210X.13345. Epub 2020 Jan 23.

Computationally efficient joint species distribution modeling of big spatial data.

Ecology. 2020 Feb;101(2):e02929. doi: 10.1002/ecy.2929. Epub 2019 Dec 20.

Integrating experimental and distribution data to predict future species patterns.

Sci Rep. 2019 Feb 12;9(1):1821. doi: 10.1038/s41598-018-38416-3.

How to make more out of community data? A conceptual framework and its implementation as models and software.

Ecol Lett. 2017 May;20(5):561-576. doi: 10.1111/ele.12757. Epub 2017 Mar 20.

Fit-for-purpose: species distribution model performance depends on evaluation criteria - Dutch Hoverflies as a case study.

PLoS One. 2013 May 14;8(5):e63708. doi: 10.1371/journal.pone.0063708. Print 2013.

Cross-validation of species distribution models: removing spatial sorting bias and calibration with a null model.

Ecology. 2012 Mar;93(3):679-88. doi: 10.1890/11-0826.1.

Predicting habitat suitability for rare plants at local spatial scales using a species distribution model.

Ecol Appl. 2011 Jan;21(1):33-47. doi: 10.1890/09-1190.1.

The meaning and use of the area under a receiver operating characteristic (ROC) curve.

Radiology. 1982 Apr;143(1):29-36. doi: 10.1148/radiology.143.1.7063747.

Measuring the accuracy of diagnostic systems.

Science. 1988 Jun 3;240(4857):1285-93. doi: 10.1126/science.3287615.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

评估有无模型的预测性能：为何同一模型会表现出优劣不同？

Evaluating the predictive performance of presence-absence models: Why can the same model appear excellent or poor?

作者信息

Abrego Nerea, Ovaskainen Otso

机构信息

Department of Biological and Environmental Science University of Jyväskylä Jyväskylä Finland.

Department of Agricultural Sciences University of Helsinki Helsinki Finland.

出版信息

Ecol Evol. 2023 Dec 18;13(12):e10784. doi: 10.1002/ece3.10784. eCollection 2023 Dec.

DOI:10.1002/ece3.10784

PMID:38111919

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10726276/

Abstract

摘要

评估有无模型的预测性能：为何同一模型会表现出优劣不同？

Evaluating the predictive performance of presence-absence models: Why can the same model appear excellent or poor?

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

评估有无模型的预测性能：为何同一模型会表现出优劣不同？

Evaluating the predictive performance of presence-absence models: Why can the same model appear excellent or poor?

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献