Network Dynamics and Simulation Science Laboratory/Virginia Bioinformatics Institute/Virginia Tech, Blacksburg, Virginia, USA.
BMC Infect Dis. 2014 Jan 9;14:12. doi: 10.1186/1471-2334-14-12.
A forecast can be defined as an endeavor to quantitatively estimate a future event or probabilities assigned to a future occurrence. Forecasting stochastic processes such as epidemics is challenging since there are several biological, behavioral, and environmental factors that influence the number of cases observed at each point during an epidemic. However, accurate forecasts of epidemics would impact timely and effective implementation of public health interventions. In this study, we introduce a Dirichlet process (DP) model for classifying and forecasting influenza epidemic curves.
The DP model is a nonparametric Bayesian approach that enables the matching of current influenza activity to simulated and historical patterns, identifies epidemic curves different from those observed in the past and enables prediction of the expected epidemic peak time. The method was validated using simulated influenza epidemics from an individual-based model and the accuracy was compared to that of the tree-based classification technique, Random Forest (RF), which has been shown to achieve high accuracy in the early prediction of epidemic curves using a classification approach. We also applied the method to forecasting influenza outbreaks in the United States from 1997-2013 using influenza-like illness (ILI) data from the Centers for Disease Control and Prevention (CDC).
We made the following observations. First, the DP model performed as well as RF in identifying several of the simulated epidemics. Second, the DP model correctly forecasted the peak time several days in advance for most of the simulated epidemics. Third, the accuracy of identifying epidemics different from those already observed improved with additional data, as expected. Fourth, both methods correctly classified epidemics with higher reproduction numbers (R) with a higher accuracy compared to epidemics with lower R values. Lastly, in the classification of seasonal influenza epidemics based on ILI data from the CDC, the methods' performance was comparable.
Although RF requires less computational time compared to the DP model, the algorithm is fully supervised implying that epidemic curves different from those previously observed will always be misclassified. In contrast, the DP model can be unsupervised, semi-supervised or fully supervised. Since both methods have their relative merits, an approach that uses both RF and the DP model could be beneficial.
预测可以被定义为一种定量估计未来事件或未来发生概率的努力。预测诸如传染病等随机过程具有挑战性,因为有几个生物、行为和环境因素会影响在传染病过程中的每个时间点观察到的病例数。然而,传染病的准确预测将影响公共卫生干预措施的及时和有效实施。在这项研究中,我们引入了一个狄利克雷过程(DP)模型,用于对流感流行曲线进行分类和预测。
DP 模型是一种非参数贝叶斯方法,能够将当前的流感活动与模拟和历史模式相匹配,识别与过去观察到的不同的流行曲线,并预测预期的流行高峰时间。该方法使用基于个体的模型中的模拟流感流行进行了验证,并将其准确性与基于树的分类技术,随机森林(RF)进行了比较,RF 已被证明在使用分类方法早期预测流行曲线时具有很高的准确性。我们还使用疾病控制与预防中心(CDC)的流感样疾病(ILI)数据,对 1997-2013 年美国的流感爆发进行了预测。
我们有以下发现。首先,DP 模型在识别几个模拟流行时的表现与 RF 一样好。其次,DP 模型在大多数模拟流行中提前几天正确预测了高峰时间。第三,随着数据的增加,正确识别与以前观察到的不同的流行的准确性有所提高,这是预期的。第四,两种方法都能正确地对具有较高繁殖数(R)的流行进行分类,其准确性高于 R 值较低的流行。最后,在基于 CDC 的 ILI 数据对季节性流感流行进行分类时,两种方法的性能相当。
虽然 RF 与 DP 模型相比所需的计算时间更少,但该算法是完全监督的,这意味着与以前观察到的不同的流行曲线将始终被错误分类。相比之下,DP 模型可以是无监督、半监督或完全监督的。由于这两种方法都有其相对的优点,使用 RF 和 DP 模型的方法可能是有益的。