一种使用概率随机森林、主成分分析和遗传算法的新型物种分布预测混合模型。

A novel hybrid model for species distribution prediction using probabilistic random forest, principal component analysis and genetic algorithm.

作者信息

Adekunle Taiwo A, Ogundoyin Ibrahim K, Akanbi Caleb O

机构信息

Department of Computer Science, Osun State University, Osogbo, Nigeria.

出版信息

PLoS One. 2025 Sep 10;20(9):e0326122. doi: 10.1371/journal.pone.0326122. eCollection 2025.

DOI:10.1371/journal.pone.0326122

PMID:40929112

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12422458/

Abstract

Probabilistic Random Forest is an extension of the traditional Random Forest machine learning algorithm that is one of the frequently used machine learning algorithms employed for species distribution modeling. However, with the use of complex dataset for predicting the presence or absence of the species, It is essential that feature extraction is important to generate optimal prediction that can affect the model accuracy and AUC score of the model simulation. In this paper, we integrated the Genetic Algorithm Optimization technique, which is popular for its excellent feature extraction technique, to enhance the predictive performance of the PRF Model. a novel hybrid algorithm the genetically optimized probabilistic random forest algorithm, designed for predicting the distribution of mastomys natalensis in Nigeria. The model was also compared with existing models for dimensionality reduction with other optimization techniques, such as Principal Component Analysis, Grey Wolf, Optimizer optimized backpropagation neural network algorithm (GNNA), Butterfly Optimization Algorithm. These models were evaluated using four performance metrics, accuracy, the areas under curve, sensitivity, specificity, F1_score and precision. We also examined the spatial predictive distribution of the models. The results generated that the predictive performance of PRFGA, significantly improved compared to PRFPCA, GNNA and PRFBOA in predicting the presence or absence of mastomys natalensis with a presence only and pseudo-absence sample set. the PRFGA demonstrated a high predictive power in predicting the spatial distribution of the presence or absence of mastomys natalensis in Nigeria. The integration of the Genetic Algorithm optimization technique, stems from its renowned ability to address the specific challenges of data uncertainty and high-dimensionality reduction in feature extraction sets of SDMs, to enhance the performance of the PRF model.

摘要

概率随机森林是传统随机森林机器学习算法的扩展，传统随机森林是物种分布建模中常用的机器学习算法之一。然而，在使用复杂数据集预测物种的存在与否时，特征提取对于生成能够影响模型准确性和模型模拟AUC分数的最优预测至关重要。在本文中，我们集成了以其出色的特征提取技术而闻名的遗传算法优化技术，以提高PRF模型的预测性能。一种新颖的混合算法——遗传优化概率随机森林算法，旨在预测尼日利亚家鼠的分布。该模型还与使用其他优化技术（如主成分分析、灰狼优化器优化的反向传播神经网络算法（GNNA）、蝴蝶优化算法）进行降维的现有模型进行了比较。使用四个性能指标（准确率、曲线下面积、灵敏度、特异性、F1分数和精确率）对这些模型进行了评估。我们还检查了模型的空间预测分布。结果表明，在使用仅存在和伪不存在样本集预测家鼠的存在与否时，PRFGA的预测性能与PRFPCA、GNNA和PRFBOA相比有显著提高。PRFGA在预测尼日利亚家鼠存在与否的空间分布方面表现出很高的预测能力。遗传算法优化技术的集成源于其解决物种分布模型特征提取集中数据不确定性和高维降维特定挑战的卓越能力，从而提高了PRF模型的性能。