Royal Botanic Gardens, Kew, Richmond, Surrey, TW9 3AE, UK.
Conserv Biol. 2023 Feb;37(1):e13992. doi: 10.1111/cobi.13992. Epub 2022 Oct 13.
Assessing species' extinction risk is vital to setting conservation priorities. However, assessment endeavors, such as those used to produce the IUCN Red List of Threatened Species, have significant gaps in taxonomic coverage. Automated assessment (AA) methods are gaining popularity to fill these gaps. Choices made in developing, using, and reporting results of AA methods could hinder their successful adoption or lead to poor allocation of conservation resources. We explored how choice of data cleaning type and level, taxonomic group, training sample, and automation method affect performance of threat status predictions for plant species. We used occurrences from the Global Biodiversity Information Facility (GBIF) to generate assessments for species in 3 taxonomic groups based on 6 different occurrence-based AA methods. We measured each method's performance and coverage following increasingly stringent occurrence cleaning. Automatically cleaned data from GBIF performed comparably to occurrence records cleaned manually by experts. However, all types of data cleaning limited the coverage of AAs. Overall, machine-learning-based methods performed well across taxa, even with minimal data cleaning. Results suggest a machine-learning-based method applied to minimally cleaned data offers the best compromise between performance and species coverage. However, optimal data cleaning, training sample, and automation methods depend on the study group, intended applications, and expertise.
评估物种灭绝风险对于确定保护优先级至关重要。然而,评估工作(如用于编制 IUCN 濒危物种红色名录的评估工作)在分类学覆盖范围方面存在显著差距。自动化评估 (AA) 方法越来越受欢迎,以填补这些差距。在开发、使用和报告 AA 方法结果时做出的选择可能会阻碍其成功采用,或导致保护资源的分配不当。我们探讨了数据清理类型和级别、分类群、训练样本和自动化方法的选择如何影响植物物种威胁状况预测的性能。我们使用全球生物多样性信息设施 (GBIF) 的出现来为基于 6 种不同基于出现的 AA 方法的 3 个分类群中的物种生成评估。我们根据越来越严格的出现清理来衡量每种方法的性能和覆盖范围。来自 GBIF 的自动清理数据与专家手动清理的出现记录表现相当。然而,所有类型的数据清理都限制了 AA 的覆盖范围。总体而言,即使在数据清理最少的情况下,基于机器学习的方法在所有分类群中都表现良好。结果表明,应用于最小化清理数据的基于机器学习的方法在性能和物种覆盖之间提供了最佳折衷。然而,最佳的数据清理、训练样本和自动化方法取决于研究群体、预期应用和专业知识。