Cule Madeleine, Donnelly Peter
Department of Statistics, 1 South Parks Road, Oxford OX1 3TG.
Wellcome Trust Centre for Human Genetics, Roosevelt Drive, Oxford OX3 7BN.
Ann Appl Stat. 2017;11(2):655-679. doi: 10.1214/16-aoas1011.
The combination of genetic information with electronic patient records promises to provide a powerful new resource for understanding human disease and its treatment. Here we develop and apply a novel stochastic compartmental model to a large dataset on infection (CDI) in three Oxfordshire hospitals over a 2.5 year period which combines genetic information on 858 confirmed cases of CDI with a database of 750,000 patient records. is a major cause of healthcare-associated diarrhoea and is responsible for substantial mortality and morbidity, with relatively little known about its biology or its transmission epidemiology. Bayesian analysis of our model, via Markov chain Monte Carlo, provides new information about the biology of CDI, including genetic heterogeneity in infectiousness across different sequence types, and evidence for ward contamination as a significant mode of transmission, and allows inferences about the contribution of particular individuals, wards, or hospitals to transmission of the bacterium, and assessment of changes in these over time following changes in hospital practice. Our work demonstrates the value of using statistical modelling and computational inference on large-scale hospital patient databases and genetic data.
将遗传信息与电子病历相结合,有望为理解人类疾病及其治疗提供一种强大的新资源。在此,我们开发了一种新颖的随机 compartmental 模型,并将其应用于牛津郡三家医院2.5年期间关于艰难梭菌感染(CDI)的大型数据集,该数据集将858例确诊CDI病例的遗传信息与一个包含750,000份患者记录的数据库相结合。艰难梭菌感染是医疗保健相关腹泻的主要原因,会导致大量的死亡率和发病率,而对其生物学特性或传播流行病学了解相对较少。通过马尔可夫链蒙特卡罗方法对我们的模型进行贝叶斯分析,提供了关于艰难梭菌感染生物学的新信息,包括不同序列类型之间传染性的遗传异质性,以及病房污染作为一种重要传播方式的证据,并允许推断特定个体、病房或医院对该细菌传播的贡献,以及评估医院实践变化后这些因素随时间的变化。我们的工作证明了对大规模医院患者数据库和遗传数据使用统计建模和计算推理的价值。