De Maio Nicola, Wu Chieh-Hsi, O'Reilly Kathleen M, Wilson Daniel
Institute for Emerging Infections, Oxford Martin School, Oxford, United Kingdom; Nuffield Department of Medicine, University of Oxford, Oxford, United Kingdom.
Nuffield Department of Medicine, University of Oxford, Oxford, United Kingdom.
PLoS Genet. 2015 Aug 12;11(8):e1005421. doi: 10.1371/journal.pgen.1005421. eCollection 2015 Aug.
Phylogeographic methods aim to infer migration trends and the history of sampled lineages from genetic data. Applications of phylogeography are broad, and in the context of pathogens include the reconstruction of transmission histories and the origin and emergence of outbreaks. Phylogeographic inference based on bottom-up population genetics models is computationally expensive, and as a result faster alternatives based on the evolution of discrete traits have become popular. In this paper, we show that inference of migration rates and root locations based on discrete trait models is extremely unreliable and sensitive to biased sampling. To address this problem, we introduce BASTA (BAyesian STructured coalescent Approximation), a new approach implemented in BEAST2 that combines the accuracy of methods based on the structured coalescent with the computational efficiency required to handle more than just few populations. We illustrate the potentially severe implications of poor model choice for phylogeographic analyses by investigating the zoonotic transmission of Ebola virus. Whereas the structured coalescent analysis correctly infers that successive human Ebola outbreaks have been seeded by a large unsampled non-human reservoir population, the discrete trait analysis implausibly concludes that undetected human-to-human transmission has allowed the virus to persist over the past four decades. As genomics takes on an increasingly prominent role informing the control and prevention of infectious diseases, it will be vital that phylogeographic inference provides robust insights into transmission history.
系统发育地理学方法旨在从遗传数据中推断迁移趋势和采样谱系的历史。系统发育地理学的应用广泛,在病原体研究方面,包括重建传播历史以及疫情的起源和出现情况。基于自下而上的群体遗传学模型的系统发育地理学推断计算成本高昂,因此基于离散性状进化的更快替代方法变得流行起来。在本文中,我们表明基于离散性状模型推断迁移率和根位置极其不可靠,且对有偏差的采样很敏感。为了解决这个问题,我们引入了BASTA(贝叶斯结构化合并近似法),这是一种在BEAST2中实现的新方法,它将基于结构化合并的方法的准确性与处理多个群体所需的计算效率结合起来。我们通过研究埃博拉病毒的人畜共患病传播,说明了系统发育地理学分析中模型选择不当可能产生的严重影响。虽然结构化合并分析正确推断出连续的人类埃博拉疫情是由大量未采样的非人类宿主群体引发的,但离散性状分析却得出了不合理的结论,即未检测到的人际传播使该病毒在过去四十年中得以持续存在。随着基因组学在为传染病的控制和预防提供信息方面发挥越来越重要的作用,系统发育地理学推断能够为传播历史提供可靠见解将至关重要。