Department of Biomedical Informatics, Arizona State University, Tempe, Arizona, USA.
Biodesign Center for Environmental Health Engineering, Arizona State University, Tempe, Arizona, USA.
Sci Rep. 2018 Apr 12;8(1):5905. doi: 10.1038/s41598-018-24264-8.
The use of generalized linear models in Bayesian phylogeography has enabled researchers to simultaneously reconstruct the spatiotemporal history of a virus and quantify the contribution of predictor variables to that process. However, little is known about the sensitivity of this method to the choice of the discrete state partition. Here we investigate this question by analyzing a data set containing 299 sequences of the West Nile virus envelope gene sampled in the United States and fifteen predictors aggregated at four spatial levels. We demonstrate that although the topology of the viral phylogenies was consistent across analyses, support for the predictors depended on the level of aggregation. In particular, we found that the variance of the predictor support metrics was minimized at the most precise level for several predictors and maximized at more sparse levels of aggregation. These results suggest that caution should be taken when partitioning a region into discrete locations to ensure that interpretable, reproducible posterior estimates are obtained. These results also demonstrate why researchers should use the most precise discrete states possible to minimize the posterior variance in such estimates and reveal what truly drives the diffusion of viruses.
广义线性模型在贝叶斯系统地理学中的应用使得研究人员能够同时重建病毒的时空历史,并量化预测变量对该过程的贡献。然而,对于这种方法对离散状态划分的选择的敏感性知之甚少。在这里,我们通过分析一个包含在美国采集的 299 个西尼罗河病毒包膜基因序列和 15 个聚合在四个空间水平的预测因子的数据集来研究这个问题。我们证明,尽管病毒系统发育的拓扑结构在分析中是一致的,但对预测因子的支持取决于聚合的水平。具体来说,我们发现,对于一些预测因子,在最精确的水平上,预测因子支持度量的方差最小化,而在更稀疏的聚合水平上,方差最大化。这些结果表明,在将一个区域划分为离散位置时应谨慎操作,以确保获得可解释、可重现的后验估计。这些结果还表明了为什么研究人员应该使用最精确的离散状态来最小化此类估计中的后验方差,并揭示真正驱动病毒扩散的因素。