Khan Raja Waqar Ahmed, Shaheen Hamayun, Islam Dar Muhammad Ejaz Ul, Habib Tariq, Manzoor Muhammad, Gillani Syed Waseem, Al-Andal Abeer, Ayoola John Oluwafemi, Waheed Muhammad
Department of Botany, The University of Azad Jammu and Kashmir, Muzaffarabad, Pakistan.
Department of Plant Sciences, Quaid-i-Azam University, Islamabad, 45320, Pakistan.
BMC Plant Biol. 2025 Jul 15;25(1):915. doi: 10.1186/s12870-025-06937-5.
Himalayan forests are fragile, rich in biodiversity, and face increasing threats from anthropogenic pressures and climate change. Assessing their health is critical for sustainable forest management. This study integrated ecological indicators (tree density, size, regeneration, deforestation, slope, grazing, and erosion) with machine learning (ML) to classify forest health and identify key drivers across 37 Western Himalayan sites. Principal component analysis (PCA) reduced data dimensionality, highlighting major ecological gradients. K-means clustering was used to group forests into three distinct classes based on ecological characteristics, due to its efficiency in identifying natural patterns within multivariate data. ML models, including Decision Tree (DT), Random Forest (RF), and Support Vector Machine (SVM) were trained and validated using an 80:20 train-test split and 5-fold cross-validation.
PCA revealed that elevation, disturbance, and regeneration explained 74.3% variance. Forest health varied across sites, with 10 categorized as healthy, 19 as moderate, and 8 as unhealthy. Forest regeneration was highly skewed (2.67) and leptokurtic (9.8), with few sites showing high seedling abundance, while deforestation (mean = 294 stumps ha) indicated uneven human impact. Among ML models, RF showed the best performance with a mean accuracy of 0.83, Kappa 0.87, and balanced accuracy 0.88. SVM followed with 0.75 accuracy, Kappa 0.70, and balanced accuracy 0.81. DT performed lowest with 0.66 accuracy and Kappa 0.45. Cross-validation confirmed RF's highest mean accuracy (90.3%), followed by SVM (88.1%) and DT (65.1%). RF-based feature importance analysis showed tree DBH, height, regeneration rate, soil erosion, and tree density as key ecological drivers of forest health.
This study highlights ML-driven classification as a precise, scalable, and objective tool for large-scale forest health assessments. Conservation efforts should prioritize degraded forests through afforestation, slope stabilization, controlled grazing, erosion management, and continuous ecosystem monitoring. Future studies should integrate high-resolution remote sensing (e.g., Landsat, Sentinel-2) and climate datasets (e.g., temperature, precipitation, and drought indices) to enhance predictive capabilities and support long-term forest management planning. The findings underscore the value of data-driven approaches, establishing machine learning as an effective tool to enhance forest monitoring and support evidence-based forest conservation and management in the Himalayas.
喜马拉雅森林生态脆弱,生物多样性丰富,且面临着来自人为压力和气候变化日益增加的威胁。评估其健康状况对于森林可持续管理至关重要。本研究将生态指标(树木密度、大小、更新、森林砍伐、坡度、放牧和侵蚀)与机器学习(ML)相结合,对37个喜马拉雅西部站点的森林健康状况进行分类,并确定关键驱动因素。主成分分析(PCA)降低了数据维度,突出了主要生态梯度。由于K均值聚类在识别多变量数据中的自然模式方面效率较高,因此被用于根据生态特征将森林分为三个不同类别。使用80:20的训练-测试分割和5折交叉验证对包括决策树(DT)、随机森林(RF)和支持向量机(SVM)在内的ML模型进行训练和验证。
PCA显示海拔、干扰和更新解释了74.3%的方差。各站点的森林健康状况各不相同,其中10个被归类为健康,19个为中等,8个为不健康。森林更新高度偏态(2.67)且峰度高(9.8),很少有站点显示出高幼苗丰度,而森林砍伐(平均=294个树桩/公顷)表明人类影响不均衡。在ML模型中,RF表现最佳,平均准确率为0.83,卡帕值为0.87,平衡准确率为0.88。SVM次之,准确率为0.75,卡帕值为0.70,平衡准确率为0.81。DT表现最差,准确率为0.66,卡帕值为0.45。交叉验证证实RF的平均准确率最高(90.3%),其次是SVM(88.1%)和DT(65.1%)。基于RF的特征重要性分析表明,树木胸径、高度、更新率、土壤侵蚀和树木密度是森林健康的关键生态驱动因素。
本研究强调基于ML的分类是一种用于大规模森林健康评估的精确、可扩展且客观的工具。保护工作应通过造林、稳定坡度、控制放牧、侵蚀管理和持续的生态系统监测将退化森林作为优先事项。未来的研究应整合高分辨率遥感(例如陆地卫星、哨兵-2)和气候数据集(例如温度、降水和干旱指数),以提高预测能力并支持长期森林管理规划。研究结果强调了数据驱动方法的价值,确立了机器学习作为加强喜马拉雅地区森林监测以及支持基于证据的森林保护和管理的有效工具。