Vignali Sergio, Barras Arnaud G, Arlettaz Raphaël, Braunisch Veronika
Division of Conservation Biology Institute of Ecology and Evolution University of Bern Bern Switzerland.
Forest Research Institute of Baden-Wuerttemberg Freiburg Germany.
Ecol Evol. 2020 Sep 30;10(20):11488-11506. doi: 10.1002/ece3.6786. eCollection 2020 Oct.
Balancing model complexity is a key challenge of modern computational ecology, particularly so since the spread of machine learning algorithms. Species distribution models are often implemented using a wide variety of machine learning algorithms that can be fine-tuned to achieve the best model prediction while avoiding overfitting. We have released , a new R package that aims to facilitate training, tuning, and evaluation of species distribution models in a unified framework. The main innovations of this package are its functions to perform data-driven variable selection, and a novel genetic algorithm to tune model hyperparameters. Real-time and interactive charts are displayed during the execution of several functions to help users understand the effect of removing a variable or varying model hyperparameters on model performance. supports three different metrics to evaluate model performance: the area under the receiver operating characteristic curve, the true skill statistic, and Akaike's information criterion corrected for small sample sizes. It implements four statistical methods: artificial neural networks, boosted regression trees, maximum entropy modeling, and random forest. Moreover, it includes functions to display the outputs and create a final report. therefore represents a new, unified and user-friendly framework for the still-growing field of species distribution modeling.
平衡模型复杂性是现代计算生态学的一项关键挑战,尤其是自机器学习算法普及以来。物种分布模型通常使用各种各样的机器学习算法来实现,这些算法可以进行微调以实现最佳模型预测,同时避免过度拟合。我们发布了一个新的R包,旨在在统一框架中促进物种分布模型的训练、调优和评估。该包的主要创新之处在于其执行数据驱动变量选择的功能,以及一种用于调整模型超参数的新型遗传算法。在执行几个函数期间会显示实时交互式图表,以帮助用户了解删除一个变量或改变模型超参数对模型性能的影响。支持三种不同的指标来评估模型性能:受试者工作特征曲线下的面积、真实技能统计量以及针对小样本量校正的赤池信息准则。它实现了四种统计方法:人工神经网络、提升回归树、最大熵建模和随机森林。此外,它还包括显示输出和创建最终报告的功能。因此,对于仍在不断发展的物种分布建模领域来说,它代表了一个新的、统一且用户友好的框架。