Information and Computational Sciences, James Hutton Institute, Dundee, United Kingdom.
Phytopathology. 2021 Feb;111(2):321-332. doi: 10.1094/PHYTO-05-20-0185-R. Epub 2021 Jan 13.
Information from crop disease surveillance programs and outbreak investigations provides real-world data about the drivers of epidemics. In many cases, however, only information on outbreaks is collected and data from surrounding healthy crops are omitted. Use of such data to develop models that can forecast risk/no risk of disease is therefore problematic, as information relating to the no-risk status of healthy crops is missing. This study explored a novel application of anomaly detection techniques to derive models for forecasting risk of crop disease from data composed of outbreaks only. This was done in two steps. In the training phase, the algorithms were used to learn the envelope of weather conditions most associated with historic crop disease outbreaks. In the testing phase, the algorithms were used for hindcasting of historic outbreak events. Five different anomaly detection algorithms were compared according to their accuracy in forecasting outbreaks: robust covariance, one-class -means, Gaussian mixture model, kernel density estimation, and one-class support vector machine. A case study of potato late blight survey data from across Great Britain was used for proof of concept. The results showed that Gaussian mixture model had the highest forecast accuracy at 97.0%, followed by one-class -means at 96.9%. There was added value in combining the algorithms in an ensemble to provide a more accurate and robust forecasting tool that can be tailored to produce region-specific alerts. The techniques used here can easily be applied to outbreak data from other crop pathosystems to derive tools for agricultural decision support.
作物病害监测计划和疫情调查提供的信息为疫情驱动因素提供了真实世界的数据。然而,在许多情况下,仅收集有关疫情的信息,而忽略了周围健康作物的数据。因此,使用此类数据来开发可以预测疾病风险/无风险的模型存在问题,因为缺少与健康作物无风险状态相关的信息。本研究探讨了一种新颖的异常检测技术的应用,该技术可从仅由疫情组成的数据中得出预测作物疾病风险的模型。这分两步进行。在训练阶段,算法用于学习与历史作物疫情最相关的天气条件范围。在测试阶段,算法用于历史疫情事件的回溯预测。根据其在预测疫情方面的准确性,比较了五种不同的异常检测算法:稳健协方差、单类均值、高斯混合模型、核密度估计和单类支持向量机。使用整个英国的马铃薯晚疫病调查数据进行案例研究,以验证概念。结果表明,高斯混合模型的预测准确率最高,为 97.0%,其次是单类均值,为 96.9%。将算法组合成一个集合以提供更准确、更稳健的预测工具,从而可以针对特定区域发出警报,这具有附加价值。此处使用的技术可以轻松应用于来自其他作物病原系统的疫情数据,以开发农业决策支持工具。