模型优度度量中的流行率依赖性，特别强调真技能统计量。

Prevalence dependence in model goodness measures with special emphasis on true skill statistics.

作者信息

Somodi Imelda, Lepesi Nikolett, Botta-Dukát Zoltán

机构信息

MTA Centre for Ecological Research Tihany Hungary.

Department of Plant Systematics, Ecology and Theoretical Biology Eötvös Loránd University Budapest Hungary; National Adaptation Centre Geological and Geophysical Institute of Hungary Budapest Hungary.

出版信息

Ecol Evol. 2017 Jan 12;7(3):863-872. doi: 10.1002/ece3.2654. eCollection 2017 Feb.

DOI:10.1002/ece3.2654

PMID:28168023

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5288248/

Abstract

It has long been a concern that performance measures of species distribution models react to attributes of the modeled entity arising from the input data structure rather than to model performance. Thus, the study of Allouche et al. (, 43, 1223, 2006) identifying the true skill statistics (TSS) as being independent of prevalence had a great impact. However, empirical experience questioned the validity of the statement. We searched for technical reasons behind these observations. We explored possible sources of prevalence dependence in TSS including sampling constraints and species characteristics, which influence the calculation of TSS. We also examined whether the widespread solution of using the maximum of TSS for comparison among species introduces a prevalence effect. We found that the design of Allouche et al. (, 43, 1223, 2006) was flawed, but TSS is indeed independent of prevalence if model predictions are binary and under the strict set of assumptions methodological studies usually apply. However, if we take realistic sources of prevalence dependence, effects appear even in binary calculations. Furthermore, in the widespread approach of using maximum TSS for continuous predictions, the use of the maximum alone induces prevalence dependence for small, but realistic samples. Thus, prevalence differences need to be taken into account when model comparisons are carried out based on discrimination capacity. The sources we identified can serve as a checklist to safely control comparisons, so that true discrimination capacity is compared as opposed to artefacts arising from data structure, species characteristics, or the calculation of the comparison measure (here TSS).

摘要

长期以来，人们一直担心物种分布模型的性能指标反映的是输入数据结构中建模实体的属性，而非模型性能。因此，阿卢什等人（2006年，第43卷，第1223页）将真技能统计量（TSS）确定为与患病率无关的研究产生了重大影响。然而，经验经验对这一说法的有效性提出了质疑。我们寻找了这些观察结果背后的技术原因。我们探讨了TSS中患病率依赖性的可能来源，包括抽样限制和物种特征，这些因素会影响TSS的计算。我们还研究了使用TSS最大值进行物种间比较这一广泛采用的方法是否会引入患病率效应。我们发现，阿卢什等人（2006年，第43卷，第1223页）的设计存在缺陷，但如果模型预测是二元的，并且在方法学研究通常采用的严格假设条件下，TSS确实与患病率无关。然而，如果我们考虑患病率依赖性的现实来源，即使在二元计算中也会出现效应。此外，在使用最大TSS进行连续预测的广泛方法中，仅使用最大值就会对小样本但现实的样本产生患病率依赖性。因此，在基于判别能力进行模型比较时，需要考虑患病率差异。我们确定的这些来源可以作为一个清单，以安全地控制比较，从而比较真正的判别能力，而不是比较由数据结构、物种特征或比较指标（这里是TSS）的计算产生的人为因素。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/787b/5288248/f162e51dc162/ECE3-7-863-g001.jpg

相似文献

Prevalence dependence in model goodness measures with special emphasis on true skill statistics.

Ecol Evol. 2017 Jan 12;7(3):863-872. doi: 10.1002/ece3.2654. eCollection 2017 Feb.

Methodological and conceptual issues regarding occupational psychosocial coronary heart disease epidemiology.

Scand J Work Environ Health. 2016 May 1;42(3):251-5. doi: 10.5271/sjweh.3557. Epub 2016 Mar 9.

[Evaluating the performance of species distribution models Biomod2 and MaxEnt using the giant panda distribution data].

Ying Yong Sheng Tai Xue Bao. 2017 Dec;28(12):4001-4006. doi: 10.13287/j.1001-9332.201712.011.

Evaluating the predictive performance of presence-absence models: Why can the same model appear excellent or poor?

Ecol Evol. 2023 Dec 18;13(12):e10784. doi: 10.1002/ece3.10784. eCollection 2023 Dec.

The dependence of Cohen's kappa on the prevalence does not matter.

J Clin Epidemiol. 2005 Jul;58(7):655-61. doi: 10.1016/j.jclinepi.2004.02.021. Epub 2005 Apr 18.

Dispersal and extrapolation on the accuracy of temporal predictions from distribution models for the Darwin's frog.

Ecol Appl. 2017 Jul;27(5):1633-1645. doi: 10.1002/eap.1556. Epub 2017 Jun 19.

The effectiveness of internet-based e-learning on clinician behavior and patient outcomes: a systematic review protocol.

JBI Database System Rev Implement Rep. 2015 Jan;13(1):52-64. doi: 10.11124/jbisrir-2015-1919.

Putative null distributions corresponding to tests of differential expression in the Golden Spike dataset are intensity dependent.

BMC Genomics. 2007 Apr 19;8:105. doi: 10.1186/1471-2164-8-105.

The predictive skill of species distribution models for plankton in a changing climate.

Glob Chang Biol. 2016 Sep;22(9):3170-81. doi: 10.1111/gcb.13274. Epub 2016 Apr 4.

Identifying sources of dust aerosol using a new framework based on remote sensing and modelling.

Sci Total Environ. 2020 Oct 1;737:139508. doi: 10.1016/j.scitotenv.2020.139508. Epub 2020 May 19.

引用本文的文献

Evaluating Three Modelling Frameworks for Assessing Changes in Fin Whale Distribution in the Mediterranean Sea.

Ecol Evol. 2025 Mar 7;15(3):e71007. doi: 10.1002/ece3.71007. eCollection 2025 Mar.

Behind the mountains and over the sea: the Changbai Mountain Range provided with a Chinese residence permit all along.

Anim Cells Syst (Seoul). 2025 Mar 3;29(1):21-28. doi: 10.1080/19768354.2025.2471476. eCollection 2025.

Incorporating physiological knowledge into correlative species distribution models minimizes bias introduced by the choice of calibration area.

Mar Life Sci Technol. 2024 May 13;6(2):349-362. doi: 10.1007/s42995-024-00226-0. eCollection 2024 May.

'Fly to a Safer North': Distributional Shifts of the Orchid L. Due to Climate Change.

Biology (Basel). 2022 Mar 24;11(4):497. doi: 10.3390/biology11040497.

Method for Data Quality Assessment of Synthetic Industrial Data.

Sensors (Basel). 2022 Feb 18;22(4):1608. doi: 10.3390/s22041608.

Autumn larval cold tolerance does not predict the northern range limit of a widespread butterfly species.

Ecol Evol. 2021 May 22;11(12):8332-8346. doi: 10.1002/ece3.7663. eCollection 2021 Jun.

Novel statistical approaches to identify risk factors for soil-transmitted helminth infection in Timor-Leste.

Int J Parasitol. 2021 Aug;51(9):729-739. doi: 10.1016/j.ijpara.2021.01.005. Epub 2021 Mar 31.

Groundwater recharge potential zonation using an ensemble of machine learning and bivariate statistical models.

Sci Rep. 2021 Mar 10;11(1):5587. doi: 10.1038/s41598-021-85205-6.

Effects of climate change and land cover on the distributions of a critical tree family in the Philippines.

Sci Rep. 2021 Jan 11;11(1):276. doi: 10.1038/s41598-020-79491-9.

Predicted distribution of sand fly (Diptera: Psychodidae) species involved in the transmission of Leishmaniasis in São Paulo state, Brazil, utilizing maximum entropy ecological niche modeling.

Pathog Glob Health. 2021 Mar;115(2):108-120. doi: 10.1080/20477724.2020.1870031. Epub 2021 Jan 11.

本文引用的文献

A comparison of absolute performance of different correlative and mechanistic species distribution models in an independent area.

Ecol Evol. 2016 Jul 27;6(16):5973-86. doi: 10.1002/ece3.2332. eCollection 2016 Aug.

On the selection of thresholds for predicting species occurrence with presence-only data.

Ecol Evol. 2015 Dec 29;6(1):337-48. doi: 10.1002/ece3.1878. eCollection 2016 Jan.

The biology of rarity: Patterns, causes and consequences.

Trends Ecol Evol. 1993 Aug;8(8):298-301. doi: 10.1016/0169-5347(93)90259-R.

Combining local- and large-scale models to predict the distributions of invasive plant species.

Ecol Appl. 2010 Mar;20(2):311-26. doi: 10.1890/08-2261.1.

Climatic extremes improve predictions of spatial patterns of tree species.

Proc Natl Acad Sci U S A. 2009 Nov 17;106 Suppl 2(Suppl 2):19723-8. doi: 10.1073/pnas.0901643106. Epub 2009 Nov 6.

Multiple ecological pathways to extinction in mammals.

Proc Natl Acad Sci U S A. 2009 Jun 30;106(26):10702-5. doi: 10.1073/pnas.0901956106. Epub 2009 Jun 15.

Remote sensing-based predictors improve distribution models of rare, early successional and broadleaf tree species in Utah.

J Appl Ecol. 2007 Oct;44(5):1057-1067. doi: 10.1111/j.1365-2664.2007.01348.x.

Partitioning diversity into independent alpha and beta components.

Ecology. 2007 Oct;88(10):2427-39. doi: 10.1890/06-1736.1.

The numerical measure of the success of predictions.

Science. 1884 Nov 14;4(93):453-4. doi: 10.1126/science.ns-4.93.453-a.

Using niche-based models to improve the sampling of rare species.

Conserv Biol. 2006 Apr;20(2):501-11. doi: 10.1111/j.1523-1739.2006.00354.x.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

模型优度度量中的流行率依赖性，特别强调真技能统计量。

Prevalence dependence in model goodness measures with special emphasis on true skill statistics.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献