Shah Denis A, De Wolf Erick D, Paul Pierce A, Madden Laurence V
Department of Plant Pathology, Kansas State University, Manhattan, KS 66506.
Department of Plant Pathology, The Ohio State University, Ohio Agricultural Research and Development Center, Wooster, OH 44691.
Phytopathology. 2023 Aug;113(8):1483-1493. doi: 10.1094/PHYTO-10-22-0380-R. Epub 2023 Oct 3.
Constructing models that accurately predict Fusarium head blight (FHB) epidemics and are also amenable to large-scale deployment is a challenging task. In the United States, the emphasis has been on simple logistic regression (LR) models, which are easy to implement but may suffer from lower accuracies when compared with more complicated, harder-to-deploy (over large geographies) model frameworks such as functional or boosted regressions. This article examined the plausibility of random forests (RFs) for the binary prediction of FHB epidemics as a possible mediation between model simplicity and complexity without sacrificing accuracy. A minimalist set of predictors was also desirable rather than having the RF model use all 90 candidate variables as predictors. The input predictor set was filtered with the aid of three RF variable selection algorithms (Boruta, varSelRF, and VSURF), using resampling techniques to quantify the variability and stability of selected variable sets. Post-selection filtering produced 58 competitive RF models with no more than 14 predictors each. One variable representing temperature stability in the 20 days before anthesis was the most frequently selected predictor. This was a departure from the prominence of relative humidity-based variables previously reported in LR models for FHB. The RF models had overall superior predictive performance over the LR models and may be suitable candidates for use by the Fusarium Head Blight Prediction Center.
构建能够准确预测小麦赤霉病(FHB)流行情况且适合大规模应用的模型是一项具有挑战性的任务。在美国,重点一直放在简单的逻辑回归(LR)模型上,这种模型易于实施,但与功能回归或增强回归等更复杂、更难在大范围内应用的模型框架相比,其准确性可能较低。本文研究了随机森林(RF)用于FHB流行情况二元预测的合理性,以作为在不牺牲准确性的前提下,介于模型简单性和复杂性之间的一种可能的折衷方案。此外,希望使用最少的一组预测变量,而不是让RF模型将所有90个候选变量都用作预测变量。借助三种RF变量选择算法(Boruta、varSelRF和VSURF)对输入预测变量集进行筛选,并使用重采样技术来量化所选变量集的变异性和稳定性。选择后过滤产生了58个具有竞争力的RF模型,每个模型的预测变量不超过14个。一个代表开花前20天温度稳定性的变量是最常被选中的预测变量。这与之前在FHB的LR模型中报道的基于相对湿度的变量的突出地位有所不同。RF模型的整体预测性能优于LR模型,可能是小麦赤霉病预测中心使用的合适候选模型。