Senay Senait D, Worner Susan P
GEMS™-A CFANS & MSI initiative, University of Minnesota, 305 Cargill Building, 1500 Gortner Avenue, Saint Paul, MN 55108, USA.
Department of Plant Pathology, University of Minnesota, 495 Borlaug Hall, 1991 Upper Buford Circle, Saint Paul, MN 55108, USA.
Insects. 2019 Mar 1;10(3):65. doi: 10.3390/insects10030065.
Correlative species distribution models (SDMs) are increasingly being used to predict suitable insect habitats. There is also much criticism of prediction discrepancies among different SDMs for the same species and the lack of effective communication about SDM prediction uncertainty. In this paper, we undertook a factorial study to investigate the effects of various modeling components (species-training-datasets, predictor variables, dimension-reduction methods, and model types) on the accuracy of SDM predictions, with the aim of identifying sources of discrepancy and uncertainty. We found that model type was the major factor causing variation in species-distribution predictions among the various modeling components tested. We also found that different combinations of modeling components could significantly increase or decrease the performance of a model. This result indicated the importance of keeping modeling components constant for comparing a given SDM result. With all modeling components, constant, machine-learning models seem to outperform other model types. We also found that, on average, the Hierarchical Non-Linear Principal Components Analysis dimension-reduction method improved model performance more than other methods tested. We also found that the widely used confusion-matrix-based model-performance indices such as the area under the receiving operating characteristic curve (AUC), sensitivity, and Kappa do not necessarily help select the best model from a set of models if variation in performance is not large. To conclude, model result discrepancies do not necessarily suggest lack of robustness in correlative modeling as they can also occur due to inappropriate selection of modeling components. In addition, more research on model performance evaluation is required for developing robust and sensitive model evaluation methods. Undertaking multi-scenario species-distribution modeling, where possible, is likely to mitigate errors arising from inappropriate modeling components selection, and provide end users with better information on the resulting model prediction uncertainty.
相关物种分布模型(SDMs)越来越多地用于预测适宜的昆虫栖息地。对于同一物种,不同SDMs之间的预测差异以及缺乏关于SDM预测不确定性的有效沟通也备受诟病。在本文中,我们进行了一项析因研究,以调查各种建模组件(物种 - 训练数据集、预测变量、降维方法和模型类型)对SDM预测准确性的影响,旨在识别差异和不确定性的来源。我们发现模型类型是在所测试的各种建模组件中导致物种分布预测变化的主要因素。我们还发现建模组件的不同组合会显著提高或降低模型的性能。这一结果表明,在比较给定的SDM结果时,保持建模组件不变非常重要。在所有建模组件保持不变的情况下,机器学习模型似乎比其他模型类型表现更优。我们还发现,平均而言,分层非线性主成分分析降维方法比其他测试方法更能提高模型性能。我们还发现,如果性能差异不大,广泛使用的基于混淆矩阵的模型性能指标,如接受者操作特征曲线下面积(AUC)、灵敏度和卡帕值,不一定有助于从一组模型中选择最佳模型。总之,模型结果差异不一定意味着相关建模缺乏稳健性,因为它们也可能是由于建模组件选择不当而出现的。此外,需要更多关于模型性能评估的研究来开发稳健且灵敏的模型评估方法。尽可能进行多场景物种分布建模,可能会减轻因建模组件选择不当而产生的误差,并为最终用户提供有关所得模型预测不确定性的更好信息。