Nsoesie Elaine O, Beckman Richard, Marathe Madhav, Lewis Bryan
Network Dynamics and Simulation Science Laboratory, Virginia Bioinformatics Institute at Virginia Tech.
Stat Commun Infect Dis. 2011 Jan 1;3(1). doi: 10.2202/1948-4690.1038. Epub 2011 Oct 4.
Classification methods are widely used for identifying underlying groupings within datasets and predicting the class for new data objects given a trained classifier. This study introduces a project aimed at using a combination of simulations and classification techniques to predict epidemic curves and infer underlying disease parameters for an ongoing outbreak.Six supervised classification methods (random forest, support vector machines, nearest neighbor with three decision rules, linear and flexible discriminant analysis) were used in identifying partial epidemic curves from six agent-based stochastic simulations of influenza epidemics. The accuracy of the methods was compared using a performance metric based on the McNemar test.The findings showed that: (1) assumptions made by the methods regarding the structure of an epidemic curve influences their performance i.e. methods with fewer assumptions perform best, (2) the performance of most methods is consistent across different individual-based networks for Seattle, Los Angeles and New York and (3) combining classifiers using a weighting approach does not guarantee better prediction.
分类方法被广泛用于识别数据集中的潜在分组,并在给定训练好的分类器的情况下预测新数据对象的类别。本研究介绍了一个项目,旨在结合模拟和分类技术来预测疫情曲线,并推断正在爆发的疫情的潜在疾病参数。六种监督分类方法(随机森林、支持向量机、具有三种决策规则的最近邻、线性和灵活判别分析)被用于从六个基于主体的流感疫情随机模拟中识别部分疫情曲线。使用基于麦克尼马尔检验的性能指标比较了这些方法的准确性。研究结果表明:(1)这些方法对疫情曲线结构所做的假设会影响其性能,即假设较少的方法表现最佳;(2)大多数方法在西雅图、洛杉矶和纽约不同的基于个体的网络中的性能是一致的;(3)使用加权方法组合分类器并不能保证更好的预测。