Wong Weng-Keen, Moore Andrew, Cooper Gregory, Wagner Michael
Department of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA.
J Urban Health. 2003 Jun;80(2 Suppl 1):i66-75. doi: 10.1007/pl00022317.
This article presents an algorithm for performing early detection of disease outbreaks by searching a database of emergency department cases for anomalous patterns. Traditional techniques for anomaly detection are unsatisfactory for this problem because they identify individual data points that are rare due to particular combinations of features. Thus, these traditional algorithms discover isolated outliers of particularly strange events, such as someone accidentally shooting their ear, that are not indicative of a new outbreak. Instead, we would like to detect groups with specific characteristics that have a recent pattern of illness that is anomalous relative to historical patterns. We propose using an anomaly detection algorithm that would characterize each anomalous pattern with a rule. The significance of each rule would be carefully evaluated using the Fisher exact test and a randomization test. In this study, we compared our algorithm with a standard detection algorithm by measuring the number of false positives and the timeliness of detection. Simulated data, produced by a simulator that creates the effects of an epidemic on a city, were used for evaluation. The results indicate that our algorithm has significantly better detection times for common significance thresholds while having a slightly higher false positive rate.
本文提出了一种算法,通过在急诊科病例数据库中搜索异常模式来对疾病爆发进行早期检测。传统的异常检测技术在解决这个问题时并不令人满意,因为它们识别的是由于特定特征组合而罕见的单个数据点。因此,这些传统算法发现的是特别奇怪事件的孤立异常值,比如有人意外射中自己的耳朵,而这些并不表明有新的疫情爆发。相反,我们希望检测出具有特定特征的群体,这些群体近期的疾病模式相对于历史模式而言是异常的。我们建议使用一种异常检测算法,该算法会用一条规则来描述每个异常模式。每条规则的显著性将使用费舍尔精确检验和随机化检验进行仔细评估。在本研究中,我们通过测量误报数量和检测及时性,将我们的算法与一种标准检测算法进行了比较。由一个模拟城市中疫情影响的模拟器生成的模拟数据用于评估。结果表明,对于常见的显著性阈值,我们的算法具有显著更好的检测时间,不过误报率略高。