Suppr超能文献

在多物种分布模型的分类过程中对异质性进行建模可以提高预测性能。

Modelling heterogeneity in the classification process in multi-species distribution models can improve predictive performance.

作者信息

Adjei Kwaku Peprah, Finstad Anders Gravbrøt, Koch Wouter, O'Hara Robert Brian

机构信息

Department of Mathematical Sciences Norwegian University of Science and Technology Trondheim Norway.

Center for Biodiversity Dynamics Norwegian University of Science and Technology Trondheim Norway.

出版信息

Ecol Evol. 2024 Mar 7;14(3):e11092. doi: 10.1002/ece3.11092. eCollection 2024 Mar.

Abstract

Species distribution models and maps from large-scale biodiversity data are necessary for conservation management. One current issue is that biodiversity data are prone to taxonomic misclassifications. Methods to account for these misclassifications in multi-species distribution models have assumed that the classification probabilities are constant throughout the study. In reality, classification probabilities are likely to vary with several covariates. Failure to account for such heterogeneity can lead to biased prediction of species distributions. Here, we present a general multi-species distribution model that accounts for heterogeneity in the classification process. The proposed model assumes a multinomial generalised linear model for the classification confusion matrix. We compare the performance of the heterogeneous classification model to that of the homogeneous classification model by assessing how well they estimate the parameters in the model and their predictive performance on hold-out samples. We applied the model to gull data from Norway, Denmark and Finland, obtained from the Global Biodiversity Information Facility. Our simulation study showed that accounting for heterogeneity in the classification process increased the precision of true species' identity predictions by 30% and accuracy and recall by 6%. Since all the models in this study accounted for misclassification of some sort, there was no significant effect of accounting for heterogeneity in the classification process on the inference about the ecological process. Applying the model framework to the gull dataset did not improve the predictive performance between the homogeneous and heterogeneous models (with parametric distributions) due to the smaller misclassified sample sizes. However, when machine learning predictive scores were used as weights to inform the species distribution models about the classification process, the precision increased by 70%. We recommend multiple multinomial regression to be used to model the variation in the classification process when the data contains relatively larger misclassified samples. Machine learning prediction scores should be used when the data contains relatively smaller misclassified samples.

摘要

基于大规模生物多样性数据的物种分布模型和地图对于保护管理至关重要。当前的一个问题是生物多样性数据容易出现分类错误。在多物种分布模型中考虑这些分类错误的方法假定分类概率在整个研究过程中是恒定的。实际上,分类概率可能会随几个协变量而变化。未能考虑这种异质性可能导致物种分布的预测出现偏差。在此,我们提出了一个考虑分类过程中异质性的通用多物种分布模型。所提出的模型为分类混淆矩阵假定了一个多项广义线性模型。我们通过评估异质分类模型和同质分类模型在估计模型参数方面的表现以及它们对留出样本的预测性能,来比较二者的性能。我们将该模型应用于从全球生物多样性信息设施获取的挪威、丹麦和芬兰的海鸥数据。我们的模拟研究表明,考虑分类过程中的异质性可将真实物种身份预测的精度提高30%,准确率和召回率提高6%。由于本研究中的所有模型都考虑了某种程度的分类错误,因此在对生态过程的推断方面,考虑分类过程中的异质性没有显著影响。由于错误分类的样本量较小,将模型框架应用于海鸥数据集并没有提高同质模型和异质模型(具有参数分布)之间的预测性能。然而,当使用机器学习预测分数作为权重,让物种分布模型了解分类过程时,精度提高了70%。我们建议,当数据包含相对较大的错误分类样本时,使用多重多项回归来模拟分类过程中的变化。当数据包含相对较小的错误分类样本时,应使用机器学习预测分数。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/14b4/10918728/ca39d7cb956d/ECE3-14-e11092-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验