Suppr超能文献

为什么选择随机森林来预测大面积欠采样区域中样本较少的珍稀物种分布?三种亚洲鹤类物种模型提供了支持证据。

Why choose Random Forest to predict rare species distribution with few samples in large undersampled areas? Three Asian crane species models provide supporting evidence.

作者信息

Mi Chunrong, Huettmann Falk, Guo Yumin, Han Xuesong, Wen Lijia

机构信息

College of Nature Conservation, Beijing Forestry University , Beijing , China.

EWHALE Lab, Department of Biology and Wildlife, Institute of Arctic Biology, University of Alaska Fairbanks (UAF) , Fairbanks , AK , United States.

出版信息

PeerJ. 2017 Jan 12;5:e2849. doi: 10.7717/peerj.2849. eCollection 2017.

Abstract

Species distribution models (SDMs) have become an essential tool in ecology, biogeography, evolution and, more recently, in conservation biology. How to generalize species distributions in large undersampled areas, especially with few samples, is a fundamental issue of SDMs. In order to explore this issue, we used the best available presence records for the Hooded Crane (,  = 33), White-naped Crane (,  = 40), and Black-necked Crane (,  = 75) in China as three case studies, employing four powerful and commonly used machine learning algorithms to map the breeding distributions of the three species: TreeNet (Stochastic Gradient Boosting, Boosted Regression Tree Model), Random Forest, CART (Classification and Regression Tree) and Maxent (Maximum Entropy Models). In addition, we developed an ensemble forecast by averaging predicted probability of the above four models results. Commonly used model performance metrics (Area under ROC (AUC) and true skill statistic (TSS)) were employed to evaluate model accuracy. The latest satellite tracking data and compiled literature data were used as two independent testing datasets to confront model predictions. We found Random Forest demonstrated the best performance for the most assessment method, provided a better model fit to the testing data, and achieved better species range maps for each crane species in undersampled areas. Random Forest has been generally available for more than 20 years and has been known to perform extremely well in ecological predictions. However, while increasingly on the rise, its potential is still widely underused in conservation, (spatial) ecological applications and for inference. Our results show that it informs ecological and biogeographical theories as well as being suitable for conservation applications, specifically when the study area is undersampled. This method helps to save model-selection time and effort, and allows robust and rapid assessments and decisions for efficient conservation.

摘要

物种分布模型(SDMs)已成为生态学、生物地理学、进化领域,以及最近在保护生物学中不可或缺的工具。如何在大面积采样不足的地区,尤其是样本稀少的情况下概括物种分布,是物种分布模型的一个基本问题。为了探讨这个问题,我们以中国丹顶鹤(n = 33)、白枕鹤(n = 40)和黑颈鹤(n = 75)的现有最佳出现记录作为三个案例研究,采用四种强大且常用的机器学习算法来绘制这三个物种的繁殖分布:TreeNet(随机梯度提升,增强回归树模型)、随机森林、CART(分类与回归树)和Maxent(最大熵模型)。此外,我们通过对上述四个模型结果的预测概率进行平均,开发了一种综合预测。使用常用的模型性能指标(ROC曲线下面积(AUC)和真实技能统计量(TSS))来评估模型准确性。最新的卫星跟踪数据和汇编的文献数据被用作两个独立的测试数据集,以检验模型预测。我们发现,在大多数评估方法中,随机森林表现最佳,能更好地拟合测试数据,并在采样不足的地区为每种鹤类实现了更好的物种分布范围图。随机森林已经普遍可用20多年了,并且已知在生态预测中表现极其出色。然而,尽管其应用越来越多,但其潜力在保护、(空间)生态应用和推理方面仍未得到充分利用。我们的结果表明,它为生态和生物地理学理论提供了信息,并且适用于保护应用,特别是在研究区域采样不足的情况下。这种方法有助于节省模型选择的时间和精力,并允许进行稳健和快速的评估及决策,以实现高效保护。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9613/5237372/70d93bad6196/peerj-05-2849-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验