运用机器学习技术进行地方病监测数据中的异常检测。

Anomaly Detection in Endemic Disease Surveillance Data Using Machine Learning Techniques.

作者信息

Eze Peter U, Geard Nicholas, Mueller Ivo, Chades Iadine

机构信息

School of Computing and Information Systems, The University of Melbourne, Parkville, VIC 3010, Australia.

Walter and Eliza Hall Institute of Medical Research, Parkville, VIC 3052, Australia.

出版信息

Healthcare (Basel). 2023 Jun 30;11(13):1896. doi: 10.3390/healthcare11131896.

DOI:10.3390/healthcare11131896

PMID:37444730

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10341307/

Abstract

Disease surveillance is used to monitor ongoing control activities, detect early outbreaks, and inform intervention priorities and policies. However, data from disease surveillance that could be used to support real-time decisionmaking remain largely underutilised. Using the Brazilian Amazon malaria surveillance dataset as a case study, in this paper we explore the potential for unsupervised anomaly detection machine learning techniques to discover signals of epidemiological interest. We found that our models were able to provide an early indication of outbreak onset, outbreak peaks, and change points in the proportion of positive malaria cases. Specifically, the sustained rise in malaria in the Brazilian Amazon in 2016 was flagged by several models. We found that no single model detected all anomalies across all health regions. Because of this, we provide the minimum number of machine learning models models) to maximise the number of anomalies detected across different health regions. We discovered that the top three models that maximise the coverage of the number and types of anomalies detected across the thirteen health regions are principal component analysis, stochastic outlier selection, and the minimum covariance determinant. Anomaly detection is a potentially valuable approach to discovering patterns of epidemiological importance when confronted with a large volume of data across space and time. Our exploratory approach can be replicated for other diseases and locations to inform monitoring, timely interventions, and actions towards the goal of controlling endemic disease.

摘要

疾病监测用于监测正在进行的防控活动、发现早期疫情，并为干预重点和政策提供依据。然而，疾病监测中可用于支持实时决策的数据仍未得到充分利用。本文以巴西亚马逊地区疟疾监测数据集为例，探讨无监督异常检测机器学习技术发现具有流行病学意义信号的潜力。我们发现，我们的模型能够提前指示疫情爆发的开始、高峰期以及疟疾病例阳性比例的变化点。具体而言，2016年巴西亚马逊地区疟疾持续上升的情况被多个模型标记出来。我们发现，没有一个单一模型能检测出所有健康区域的所有异常情况。因此，我们提供了最少数量的机器学习模型，以最大限度地检测不同健康区域的异常情况。我们发现，在13个健康区域中，能最大限度覆盖检测到的异常数量和类型的前三个模型是主成分分析、随机离群值选择和最小协方差行列式。当面对大量时空数据时，异常检测是发现具有流行病学重要性模式的一种潜在有价值的方法。我们的探索性方法可以应用于其他疾病和地区，为监测、及时干预以及实现控制地方病目标的行动提供信息。