Suppr超能文献

采样策略对使用贝叶斯天际线族合并方法重建病毒种群动态质量的影响:一项模拟研究

The effects of sampling strategy on the quality of reconstruction of viral population dynamics using Bayesian skyline family coalescent methods: A simulation study.

作者信息

Hall Matthew D, Woolhouse Mark E J, Rambaut Andrew

机构信息

Institute of Evolutionary Biology, University of Edinburgh EH9 3FL, Edinburgh, UK,; Centre for Immunity, Infection and Evolution, University of Edinburgh EH9 3FL, Edinburgh, UK and.

Institute of Evolutionary Biology, University of Edinburgh EH9 3FL, Edinburgh, UK,; Centre for Immunity, Infection and Evolution, University of Edinburgh EH9 3FL, Edinburgh, UK and; Fogarty International Center, National Institutes of Health, Bethesda, MD 20892-2220, USA.

出版信息

Virus Evol. 2016 Mar 2;2(1):vew003. doi: 10.1093/ve/vew003. eCollection 2016 Jan.

Abstract

The ongoing large-scale increase in the total amount of genetic data for viruses and other pathogens has led to a situation in which it is often not possible to include every available sequence in a phylogenetic analysis and expect the procedure to complete in reasonable computational time. This raises questions about how a set of sequences should be selected for analysis, particularly if the data are used to infer more than just the phylogenetic tree itself. The design of sampling strategies for molecular epidemiology has been a neglected field of research. This article describes a large-scale simulation exercise that was undertaken to select an appropriate strategy when using the GMRF skygrid, one of the Bayesian skyline family of coalescent methods, in order to reconstruct past population dynamics. The simulated scenarios were intended to represent sampling for the population of an endemic virus across multiple geographical locations. Large phylogenies were simulated under a coalescent or structured coalescent model and sequences simulated from these trees; the resulting datasets were then downsampled for analyses according to a variety of schemes. Variation in results between different replicates of the same scheme was not insignificant, and as a result, we recommend that where possible analyses are repeated with different datasets in order to establish that elements of a reconstruction are not simply the result of the particular set of samples selected. We show that an individual stochastic choice of sequences can introduce spurious behaviour in the median line of the skygrid plot and that even marginal likelihood estimation can suggest complicated dynamics that were not in fact present. We recommend that the median line should not be used to infer historical events on its own. Sampling sequences with uniform probability with respect to both time and spatial location (deme) never performed worse than sampling with probability proportional to the effective population size at that time and in that location and frequently was superior. As a result, we recommend this approach in the design of future studies. We also confirm that the inclusion of many recent sequences from a single geographical location in an analysis tends to result in a spurious bottleneck effect in the reconstruction and caution against interpreting this as genuine.

摘要

病毒和其他病原体的遗传数据总量持续大规模增加,导致在系统发育分析中常常无法纳入每一个可用序列,并期望该过程能在合理的计算时间内完成。这就引发了关于应如何选择一组序列进行分析的问题,尤其是当这些数据用于推断的不仅仅是系统发育树本身时。分子流行病学抽样策略的设计一直是一个被忽视的研究领域。本文描述了一项大规模模拟实验,该实验旨在选择一种合适的策略,即使用贝叶斯天际线族合并方法之一的GMRF skygrid来重建过去的种群动态。模拟场景旨在代表对一种地方性病毒在多个地理位置的种群进行抽样。在合并或结构化合并模型下模拟大型系统发育树,并从这些树中模拟序列;然后根据各种方案对所得数据集进行下采样以进行分析。同一方案的不同重复之间的结果差异并非微不足道,因此,我们建议尽可能使用不同的数据集重复分析,以确定重建的元素不仅仅是所选特定样本集的结果。我们表明,序列的个体随机选择会在skygrid图的中线引入虚假行为,甚至边际似然估计也可能表明实际上并不存在的复杂动态。我们建议不应仅根据中线来推断历史事件。在时间和空间位置(deme)上以均匀概率抽样序列的表现从未比按当时和该位置的有效种群大小成比例的概率抽样更差,而且通常更优。因此,我们在未来研究的设计中推荐这种方法。我们还证实,在分析中纳入来自单个地理位置的许多近期序列往往会在重建中导致虚假的瓶颈效应,并提醒不要将其解释为真实的。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7e45/4989886/cf1a20a08022/vew003f1p.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验