Suppr超能文献

贝叶斯系统发育学中的数据整合。

Data integration in Bayesian phylogenetics.

作者信息

Hassler Gabriel W, Magee Andrew, Zhang Zhenyu, Baele Guy, Lemey Philippe, Ji Xiang, Fourment Mathieu, Suchard Marc A

机构信息

Department of Computational Medicine, University of California, Los Angeles, USA, 90095.

Department of Biostatistics, University of California, Los Angeles, USA, 90095.

出版信息

Annu Rev Stat Appl. 2023;10:353-377. doi: 10.1146/annurev-statistics-033021-112532. Epub 2022 Sep 28.

Abstract

Researchers studying the evolution of viral pathogens and other organisms increasingly encounter and use large and complex data sets from multiple different sources. Statistical research in Bayesian phylogenetics has risen to this challenge. Researchers use phylogenetics not only to reconstruct the evolutionary history of a group of organisms, but also to understand the processes that guide its evolution and spread through space and time. To this end, it is now the norm to integrate numerous sources of data. For example, epidemiologists studying the spread of a virus through a region incorporate data including genetic sequences (e.g. DNA), time, location (both continuous and discrete) and environmental covariates (e.g. social connectivity between regions) into a coherent statistical model. Evolutionary biologists routinely do the same with genetic sequences, location, time, fossil and modern phenotypes, and ecological covariates. These complex, hierarchical models readily accommodate both discrete and continuous data and have enormous combined discrete/continuous parameter spaces including, at a minimum, phylogenetic tree topologies and branch lengths. The increased size and complexity of these statistical models have spurred advances in computational methods to make them tractable. We discuss both the modeling and computational advances below, as well as unsolved problems and areas of active research.

摘要

研究病毒病原体及其他生物体进化的研究人员越来越多地遇到并使用来自多个不同来源的大型复杂数据集。贝叶斯系统发育学的统计研究已应对这一挑战。研究人员不仅使用系统发育学来重建一组生物体的进化历史,还用以理解引导其进化以及在时空上传播的过程。为此,整合众多数据源如今已成为常态。例如,研究病毒在某一地区传播的流行病学家会将包括基因序列(如DNA)、时间、地点(连续和离散的)以及环境协变量(如地区之间的社会联系)等数据纳入一个连贯的统计模型。进化生物学家在处理基因序列、地点、时间、化石和现代表型以及生态协变量时也经常这样做。这些复杂的分层模型能够轻松容纳离散和连续数据,并且具有巨大的组合离散/连续参数空间,至少包括系统发育树拓扑结构和分支长度。这些统计模型规模和复杂性的增加推动了计算方法的进步,以使它们易于处理。我们在下面讨论建模和计算方面的进展,以及未解决的问题和活跃的研究领域。

相似文献

1
Data integration in Bayesian phylogenetics.贝叶斯系统发育学中的数据整合。
Annu Rev Stat Appl. 2023;10:353-377. doi: 10.1146/annurev-statistics-033021-112532. Epub 2022 Sep 28.

引用本文的文献

3
Multi-response phylogenetic mixed models: concepts and application.多响应系统发育混合模型:概念与应用
Biol Rev Camb Philos Soc. 2025 Jun;100(3):1294-1316. doi: 10.1111/brv.70001. Epub 2025 Apr 7.
5
Leveraging graphical model techniques to study evolution on phylogenetic networks.利用图形模型技术研究系统发育网络上的进化。
Philos Trans R Soc Lond B Biol Sci. 2025 Feb 13;380(1919):20230310. doi: 10.1098/rstb.2023.0310. Epub 2025 Feb 20.

本文引用的文献

2
Maximum likelihood pandemic-scale phylogenetics.最大似然法大流行规模系统发育学。
Nat Genet. 2023 May;55(5):746-752. doi: 10.1038/s41588-023-01368-0. Epub 2023 Apr 10.
4
Stan: A Probabilistic Programming Language.斯坦:一种概率编程语言。
J Stat Softw. 2017;76. doi: 10.18637/jss.v076.i01. Epub 2017 Jan 11.
5
Global disparities in SARS-CoV-2 genomic surveillance.全球 SARS-CoV-2 基因组监测的差异。
Nat Commun. 2022 Nov 16;13(1):7003. doi: 10.1038/s41467-022-33713-y.
10
Massive parallelization boosts big Bayesian multidimensional scaling.大规模并行化提升了大型贝叶斯多维缩放。
J Comput Graph Stat. 2021;30(1):11-24. doi: 10.1080/10618600.2020.1754226. Epub 2020 Jun 8.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验