Hill A A, Crotta M, Wall B, Good L, O'Brien S J, Guitian J
CORDA, BAE Systems , Farnborough, UK.
Royal Veterinary College, University of London , London, UK.
R Soc Open Sci. 2017 Mar 29;4(3):160721. doi: 10.1098/rsos.160721. eCollection 2017 Mar.
Foodborne infection is a result of exposure to complex, dynamic food systems. The efficiency of foodborne infection is driven by ongoing shifts in genetic machinery. Next-generation sequencing technologies can provide high-fidelity data about the genetics of a pathogen. However, food safety surveillance systems do not currently provide similar high-fidelity epidemiological metadata to associate with genetic data. As a consequence, it is rarely possible to transform genetic data into actionable knowledge that can be used to genuinely inform risk assessment or prevent outbreaks. Big data approaches are touted as a revolution in decision support, and pose a potentially attractive method for closing the gap between the fidelity of genetic and epidemiological metadata for food safety surveillance. We therefore developed a simple food chain model to investigate the potential benefits of combining 'big' data sources, including both genetic and high-fidelity epidemiological metadata. Our results suggest that, as for any surveillance system, the collected data must be relevant and characterize the important dynamics of a system if we are to properly understand risk: this suggests the need to carefully consider data curation, rather than the more ambitious claims of big data proponents that unstructured and unrelated data sources can be combined to generate consistent insight. Of interest is that the biggest influencers of foodborne infection risk were contamination load and processing temperature, not genotype. This suggests that understanding food chain dynamics would probably more effectively generate insight into foodborne risk than prescribing the hazard in ever more detail in terms of genotype.
食源性感染是接触复杂、动态的食物系统的结果。食源性感染的效率受遗传机制持续变化的驱动。新一代测序技术能够提供有关病原体遗传学的高保真数据。然而,食品安全监测系统目前并未提供类似的高保真流行病学元数据来与遗传数据相关联。因此,很少有可能将遗传数据转化为可用于真正为风险评估提供信息或预防疫情爆发的可操作知识。大数据方法被誉为决策支持方面的一场革命,并且为弥合食品安全监测中遗传和流行病学元数据保真度之间的差距提供了一种潜在有吸引力的方法。因此,我们开发了一个简单的食物链模型来研究整合包括遗传和高保真流行病学元数据在内的“大”数据源的潜在益处。我们的结果表明,对于任何监测系统而言,如果我们要正确理解风险,所收集的数据必须相关且能够表征系统的重要动态:这表明需要仔细考虑数据管理,而不是大数据支持者更雄心勃勃的主张,即可以将非结构化和不相关的数据源组合起来以产生一致的见解。有趣的是,食源性感染风险的最大影响因素是污染负荷和加工温度,而非基因型。这表明,了解食物链动态可能比更详细地按基因型规定危害更有效地洞察食源性风险。