Department of Infectious Disease Epidemiology, Imperial College London, London, United Kingdom.
Department of Veterinary Medicine, University of Cambridge, Cambridge, United Kingdom.
PLoS Comput Biol. 2013;9(12):e1003397. doi: 10.1371/journal.pcbi.1003397. Epub 2013 Dec 19.
Identifying the source of transmission using pathogen genetic data is complicated by numerous biological, immunological, and behavioral factors. A large source of error arises when there is incomplete or sparse sampling of cases. Unsampled cases may act as either a common source of infection or as an intermediary in a transmission chain for hosts infected with genetically similar pathogens. It is difficult to quantify the probability of common source or intermediate transmission events, which has made it difficult to develop statistical tests to either confirm or deny putative transmission pairs with genetic data. We present a method to incorporate additional information about an infectious disease epidemic, such as incidence and prevalence of infection over time, to inform estimates of the probability that one sampled host is the direct source of infection of another host in a pathogen gene genealogy. These methods enable forensic applications, such as source-case attribution, for infectious disease epidemics with incomplete sampling, which is usually the case for high-morbidity community-acquired pathogens like HIV, Influenza and Dengue virus. These methods also enable epidemiological applications such as the identification of factors that increase the risk of transmission. We demonstrate these methods in the context of the HIV epidemic in Detroit, Michigan, and we evaluate the suitability of current sequence databases for forensic and epidemiological investigations. We find that currently available sequences collected for drug resistance testing of HIV are unlikely to be useful in most forensic investigations, but are useful for identifying transmission risk factors.
利用病原体遗传数据确定传播源会受到许多生物学、免疫学和行为因素的影响。如果对病例的采样不完整或稀疏,就会产生大量的错误。未采样的病例可能是感染的共同来源,也可能是感染遗传上相似病原体的宿主传播链中的中介。很难量化共同来源或中间传播事件的概率,这使得很难开发统计检验来利用遗传数据确认或否认假定的传播对。我们提出了一种方法,可以将传染病流行的其他信息(如随时间推移的感染发生率和流行率)纳入其中,以告知在病原体基因系统发育中,一个采样宿主是另一个宿主直接感染源的概率估计。这些方法可用于具有不完整采样的传染病流行的法医应用,例如源病例归因,这对于高发病率的社区获得性病原体(如 HIV、流感和登革热病毒)通常是如此。这些方法还可以用于识别增加传播风险的因素等流行病学应用。我们在密歇根州底特律的 HIV 流行中证明了这些方法,并评估了当前序列数据库在法医和流行病学调查中的适用性。我们发现,目前为 HIV 耐药性测试收集的可用序列在大多数法医调查中可能不太有用,但可用于识别传播风险因素。