Salathé Marcel
Digital Epidemiology Laboratory, School of Life Sciences and School of Computer and Communication Sciences, EPFL, Geneva, Switzerland.
J Infect Dis. 2016 Dec 1;214(suppl_4):S399-S403. doi: 10.1093/infdis/jiw281.
The digital revolution has contributed to very large data sets (ie, big data) relevant for public health. The two major data sources are electronic health records from traditional health systems and patient-generated data. As the two data sources have complementary strengths-high veracity in the data from traditional sources and high velocity and variety in patient-generated data-they can be combined to build more-robust public health systems. However, they also have unique challenges. Patient-generated data in particular are often completely unstructured and highly context dependent, posing essentially a machine-learning challenge. Some recent examples from infectious disease surveillance and adverse drug event monitoring demonstrate that the technical challenges can be solved. Despite these advances, the problem of verification remains, and unless traditional and digital epidemiologic approaches are combined, these data sources will be constrained by their intrinsic limits.
数字革命催生了与公共卫生相关的超大型数据集(即大数据)。两大主要数据来源是传统卫生系统的电子健康记录以及患者生成的数据。由于这两种数据来源具有互补优势——传统来源的数据准确性高,患者生成的数据速度快且种类多——它们可以结合起来构建更强大的公共卫生系统。然而,它们也面临着独特的挑战。尤其是患者生成的数据通常完全无结构化且高度依赖上下文,这本质上构成了一项机器学习挑战。传染病监测和药物不良事件监测方面的一些最新实例表明,技术挑战是可以解决的。尽管有这些进展,但验证问题依然存在,而且除非将传统和数字流行病学方法结合起来,否则这些数据来源将受到其固有局限性的制约。