Suppr超能文献

在病毒系统地理学的分类单元地理空间分配中纳入抽样不确定性。

Incorporating sampling uncertainty in the geospatial assignment of taxa for virus phylogeography.

作者信息

Scotch Matthew, Tahsin Tasnia, Weissenbacher Davy, O'Connor Karen, Magge Arjun, Vaiente Matteo, Suchard Marc A, Gonzalez-Hernandez Graciela

机构信息

College of Health Solutions, Arizona State University, 550 N. 3rd St., Phoenix, AZ, USA.

Biodesign Center for Environmental Health Engineering, Arizona State University, 727 E. Tyler St, Tempe, AZ, USA.

出版信息

Virus Evol. 2019 Feb 28;5(1):vey043. doi: 10.1093/ve/vey043. eCollection 2019 Jan.

Abstract

Discrete phylogeography using software such as BEAST considers the sampling location of each taxon as fixed; often to a single location without uncertainty. When studying viruses, this implies that there is no possibility that the location of the infected host for that taxa is somewhere else. Here, we relaxed this strong assumption and allowed for analytic integration of uncertainty for discrete virus phylogeography. We used automatic language processing methods to find and assign uncertainty to alternative potential locations. We considered two influenza case studies: H5N1 in Egypt; H1N1 pdm09 in North America. For each, we implemented scenarios in which 25 per cent of the taxa had different amounts of sampling uncertainty including 10, 30, and 50 per cent uncertainty and varied how it was distributed for each taxon. This includes scenarios that: (i) placed a specific amount of uncertainty on one location while uniformly distributing the remaining amount across all other candidate locations (correspondingly labeled , , and ); (ii) assigned the remaining uncertainty to just one other location; thus 'splitting' the uncertainty among two locations (i.e. 10/90, , and ); and (iii) eliminated uncertainty via two predefined heuristic approaches: assignment to a centroid location (CNTR) or the largest population in the country (POP). We compared all scenarios to a reference standard (RS) in which all taxa had known (absolutely certain) locations. From this, we implemented five random selections of 25 per cent of the taxa and used these for specifying uncertainty. We performed posterior analyses for each scenario, including: (a) virus persistence, (b) migration rates, (c) trunk rewards, and (d) the posterior probability of the root state. The scenarios with sampling uncertainty were closer to the RS than CNTR and POP. For H5N1, the absolute error of virus persistence had a median range of 0.005-0.047 for scenarios with sampling uncertainty-(i) and (ii) above-versus a range of 0.063-0.075 for CNTR and POP. Persistence for the pdm09 case study followed a similar trend as did our analyses of migration rates across scenarios (i) and (ii). When considering the posterior probability of the root state, we found all but one of the H5N1 scenarios with sampling uncertainty had agreement with the RS on the origin of the outbreak whereas both CNTR and POP disagreed. Our results suggest that assigning geospatial uncertainty to taxa benefits estimation of virus phylogeography as compared to heuristics. We also found that, in general, there was limited difference in results regardless of how the sampling uncertainty was assigned; uniform distribution or split between two locations did not greatly impact posterior results. This framework is available in BEAST v.1.10. In future work, we will explore viruses beyond influenza. We will also develop a web interface for researchers to use our language processing methods to find and assign uncertainty to alternative potential locations for virus phylogeography.

摘要

使用诸如BEAST等软件进行的离散系统地理学研究,将每个分类单元的采样位置视为固定的;通常是单一位置且不存在不确定性。在研究病毒时,这意味着该分类单元的感染宿主不可能在其他地方。在此,我们放宽了这一严格假设,允许对离散病毒系统地理学的不确定性进行分析整合。我们使用自动语言处理方法来查找并为替代潜在位置分配不确定性。我们考虑了两个流感案例研究:埃及的H5N1;北美的H1N1 pdm09。对于每个案例,我们实施了一些情景,其中25%的分类单元具有不同程度的采样不确定性,包括10%、30%和50%的不确定性,并改变其在每个分类单元中的分布方式。这包括以下情景:(i) 在一个位置上设置特定量的不确定性,同时将剩余量均匀分布在所有其他候选位置上(相应标记为 、 和 );(ii) 将剩余不确定性仅分配给另一个位置;从而在两个位置之间“分割”不确定性(即10/90、 和 );以及(iii) 通过两种预定义的启发式方法消除不确定性:分配到质心位置(CNTR)或该国人口最多的地区(POP)。我们将所有情景与一个参考标准(RS)进行比较,在该标准中所有分类单元都具有已知(绝对确定)的位置。据此,我们对25%的分类单元进行了五次随机选择,并使用这些选择来指定不确定性。我们对每个情景进行了后验分析,包括:(a) 病毒持久性,(b) 迁移率,(c) 主干奖励,以及(d) 根状态的后验概率。具有采样不确定性的情景比CNTR和POP更接近RS。对于H5N1,上述具有采样不确定性的情景(i)和(ii)中病毒持久性的绝对误差中位数范围为0.005 - 0.047,而CNTR和POP的范围为0.063 - 0.075。pdm09案例研究的持久性遵循类似趋势,我们对情景(i)和(ii)中迁移率的分析也是如此。在考虑根状态的后验概率时,我们发现除一个情景外,所有具有采样不确定性的H5N1情景在疫情起源方面与RS一致,而CNTR和POP均不一致。我们的结果表明,与启发式方法相比,为分类单元分配地理空间不确定性有利于病毒系统地理学的估计。我们还发现,总体而言,无论采样不确定性如何分配,结果差异有限;均匀分布或在两个位置之间分割对后验结果影响不大。此框架可在BEAST v.1.10中获取。在未来的工作中,我们将探索流感以外的病毒。我们还将开发一个网络界面,供研究人员使用我们的语言处理方法来查找并为病毒系统地理学的替代潜在位置分配不确定性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dbec/6395475/4597218e26cb/vey043f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验