有限状态空间上基于端点条件的连续时间马尔可夫链模拟及其在分子进化中的应用

SIMULATION FROM ENDPOINT-CONDITIONED, CONTINUOUS-TIME MARKOV CHAINS ON A FINITE STATE SPACE, WITH APPLICATIONS TO MOLECULAR EVOLUTION.

作者信息

Hobolth Asger, Stone Eric A

机构信息

Department of Mathematical Sciences, Aarhus University, Denmark.

出版信息

Ann Appl Stat. 2009 Sep 1;3(3):1204. doi: 10.1214/09-AOAS247.

DOI:10.1214/09-AOAS247

PMID:20148133

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2818752/

Abstract

Analyses of serially-sampled data often begin with the assumption that the observations represent discrete samples from a latent continuous-time stochastic process. The continuous-time Markov chain (CTMC) is one such generative model whose popularity extends to a variety of disciplines ranging from computational finance to human genetics and genomics. A common theme among these diverse applications is the need to simulate sample paths of a CTMC conditional on realized data that is discretely observed. Here we present a general solution to this sampling problem when the CTMC is defined on a discrete and finite state space. Specifically, we consider the generation of sample paths, including intermediate states and times of transition, from a CTMC whose beginning and ending states are known across a time interval of length T. We first unify the literature through a discussion of the three predominant approaches: (1) modified rejection sampling, (2) direct sampling, and (3) uniformization. We then give analytical results for the complexity and efficiency of each method in terms of the instantaneous transition rate matrix Q of the CTMC, its beginning and ending states, and the length of sampling time T. In doing so, we show that no method dominates the others across all model specifications, and we give explicit proof of which method prevails for any given Q, T, and endpoints. Finally, we introduce and compare three applications of CTMCs to demonstrate the pitfalls of choosing an inefficient sampler.

摘要

对连续采样数据的分析通常始于这样一种假设，即观测值代表来自潜在连续时间随机过程的离散样本。连续时间马尔可夫链（CTMC）就是这样一种生成模型，其应用范围广泛，涵盖从计算金融到人类遗传学和基因组学等多个学科。这些不同应用中的一个共同主题是需要根据离散观测到的已实现数据来模拟CTMC的样本路径。在此，当CTMC定义在离散且有限的状态空间上时，我们给出了这个采样问题的通用解决方案。具体而言，我们考虑从一个在长度为T的时间间隔内起始和结束状态已知的CTMC生成样本路径，包括中间状态和转移时间。我们首先通过讨论三种主要方法来统一相关文献：（1）改进的拒绝采样，（2）直接采样，以及（3）均匀化。然后，我们根据CTMC的瞬时转移率矩阵Q、其起始和结束状态以及采样时间T的长度，给出了每种方法的复杂度和效率的分析结果。通过这样做，我们表明在所有模型规格下没有一种方法能主导其他方法，并且我们给出了在任何给定的Q、T和端点情况下哪种方法占优的明确证明。最后，我们介绍并比较CTMC的三种应用，以展示选择低效采样器的陷阱。

相似文献

SIMULATION FROM ENDPOINT-CONDITIONED, CONTINUOUS-TIME MARKOV CHAINS ON A FINITE STATE SPACE, WITH APPLICATIONS TO MOLECULAR EVOLUTION.有限状态空间上基于端点条件的连续时间马尔可夫链模拟及其在分子进化中的应用

Ann Appl Stat. 2009 Sep 1;3(3):1204. doi: 10.1214/09-AOAS247.

Phylogenetic stochastic mapping without matrix exponentiation.无需矩阵求幂的系统发生随机映射。

J Comput Biol. 2014 Sep;21(9):676-90. doi: 10.1089/cmb.2014.0062. Epub 2014 Jun 11.

Geometric fluid approximation for general continuous-time Markov chains.一般连续时间马尔可夫链的几何流体近似

Proc Math Phys Eng Sci. 2019 Sep;475(2229):20190100. doi: 10.1098/rspa.2019.0100. Epub 2019 Sep 25.

Optimal dimensionality reduction of Markov chains using graph transformation.使用图变换对马尔可夫链进行最优降维。

J Chem Phys. 2020 Dec 28;153(24):244108. doi: 10.1063/5.0025174.

Fitting and interpreting continuous-time latent Markov models for panel data.对面板数据的连续时间潜在马尔可夫模型进行拟合和解释。

Stat Med. 2013 Nov 20;32(26):4581-95. doi: 10.1002/sim.5861. Epub 2013 Jun 5.

Exact Variance-Reduced Simulation of Lattice Continuous-Time Markov Chains with Applications in Reaction Networks.格点连续时间马尔可夫链的精确方差减少模拟及其在反应网络中的应用。

Bull Math Biol. 2019 Aug;81(8):3159-3184. doi: 10.1007/s11538-019-00576-2. Epub 2019 Feb 13.

Efficient Transition Probability Computation for Continuous-Time Branching Processes via Compressed Sensing.通过压缩感知实现连续时间分支过程的高效转移概率计算

Uncertain Artif Intell. 2015 Jul;2015:952-961.

Non-homogeneous continuous-time Markov chain with covariates: Applications to ambulatory hypertension monitoring.带协变量的非齐次连续时间马尔可夫链：在动态血压监测中的应用。

Stat Med. 2023 May 30;42(12):1965-1980. doi: 10.1002/sim.9707. Epub 2023 Mar 10.

On time-discretized versions of the stochastic SIS epidemic model: a comparative analysis.关于随机 SIS 传染病模型的时间离散化版本：比较分析。

J Math Biol. 2021 Apr 4;82(5):46. doi: 10.1007/s00285-021-01598-y.

Impact and mitigation of sampling bias to determine viral spread: Evaluating discrete phylogeography through CTMC modeling and structured coalescent model approximations.确定病毒传播时抽样偏差的影响及缓解措施：通过连续时间马尔可夫链模型和结构化合并模型近似法评估离散系统发育地理学

Virus Evol. 2023 Feb 6;9(1):vead010. doi: 10.1093/ve/vead010. eCollection 2023.

引用本文的文献

Stochastic Character Mapping: An Under-Exploited Approach to the Study of Molecular Evolution.随机特征映射：一种尚未充分利用的分子进化研究方法。

J Mol Evol. 2025 Aug;93(4):465-473. doi: 10.1007/s00239-025-10257-5. Epub 2025 Jul 8.

A cautious user's guide in applying HMMs to physical systems.一份关于将隐马尔可夫模型应用于物理系统的谨慎用户指南。

ArXiv. 2025 Jun 6:arXiv:2506.05707v1.

Bayesian phylodynamic inference of population dynamics with dormancy.具有休眠的种群动态的贝叶斯系统发育动力学推断

Proc Natl Acad Sci U S A. 2025 May 6;122(18):e2501394122. doi: 10.1073/pnas.2501394122. Epub 2025 May 2.

Bayesian Inference of Pathogen Phylogeography using the Structured Coalescent Model.使用结构化合并模型对病原体系统地理学进行贝叶斯推断

PLoS Comput Biol. 2025 Apr 21;21(4):e1012995. doi: 10.1371/journal.pcbi.1012995. eCollection 2025 Apr.

Bayesian phylodynamic inference of population dynamics with dormancy.具有休眠的群体动态的贝叶斯系统发育动力学推断

bioRxiv. 2025 Jan 22:2025.01.19.633741. doi: 10.1101/2025.01.19.633741.

Data integration in Bayesian phylogenetics.贝叶斯系统发育学中的数据整合。

Annu Rev Stat Appl. 2023;10:353-377. doi: 10.1146/annurev-statistics-033021-112532. Epub 2022 Sep 28.

Fluorescence Microscopy: a statistics-optics perspective.荧光显微镜：统计学-光学视角

ArXiv. 2023 Oct 17:arXiv:2304.01456v3.

New Phylogenetic Models Incorporating Interval-Specific Dispersal Dynamics Improve Inference of Disease Spread.新的包含区间特定扩散动态的系统发育模型可改善疾病传播的推断。

Mol Biol Evol. 2022 Aug 3;39(8). doi: 10.1093/molbev/msac159.

Enabling Inference for Context-Dependent Models of Mutation by Bounding the Propagation of Dependency.通过限制依赖性的传播来实现依赖上下文的突变模型的推理。

J Comput Biol. 2022 Aug;29(8):802-824. doi: 10.1089/cmb.2021.0644. Epub 2022 Jul 1.

Exact and computationally efficient Bayesian inference for generalized Markov modulated Poisson processes.广义马尔可夫调制泊松过程的精确且计算高效的贝叶斯推断。

Stat Comput. 2022;32(1):14. doi: 10.1007/s11222-021-10074-y. Epub 2022 Jan 6.

本文引用的文献

Uniformization for sampling realizations of Markov processes: applications to Bayesian implementations of codon substitution models.马尔可夫过程抽样实现的均匀化：在密码子替换模型贝叶斯实现中的应用。

Bioinformatics. 2008 Jan 1;24(1):56-62. doi: 10.1093/bioinformatics/btm532. Epub 2007 Nov 14.

Counting labeled transitions in continuous-time Markov models of evolution.计算进化的连续时间马尔可夫模型中的标记转移

J Math Biol. 2008 Mar;56(3):391-412. doi: 10.1007/s00285-007-0120-8. Epub 2007 Sep 14.

Conjugate Gibbs sampling for Bayesian phylogenetic models.贝叶斯系统发育模型的共轭吉布斯抽样

J Comput Biol. 2006 Dec;13(10):1701-22. doi: 10.1089/cmb.2006.13.1701.

Statistical inference in evolutionary models of DNA sequences via the EM algorithm.通过期望最大化（EM）算法对DNA序列进化模型进行统计推断。

Stat Appl Genet Mol Biol. 2005;4:Article18. doi: 10.2202/1544-6115.1127. Epub 2005 Aug 12.

Inferring complex DNA substitution processes on phylogenies using uniformization and data augmentation.利用均匀化和数据增强在系统发育树上推断复杂的DNA替代过程。

Syst Biol. 2006 Apr;55(2):259-69. doi: 10.1080/10635150500541599.

Initial sequence of the chimpanzee genome and comparison with the human genome.黑猩猩基因组的初始序列及其与人类基因组的比较。

Nature. 2005 Sep 1;437(7055):69-87. doi: 10.1038/nature04072.

Bayesian Markov chain Monte Carlo sequence analysis reveals varying neutral substitution patterns in mammalian evolution.贝叶斯马尔可夫链蒙特卡罗序列分析揭示了哺乳动物进化中不同的中性替代模式。

Proc Natl Acad Sci U S A. 2004 Sep 28;101(39):13994-4001. doi: 10.1073/pnas.0404142101. Epub 2004 Aug 3.

Initial sequencing and comparative analysis of the mouse genome.小鼠基因组的初步测序与比较分析。

Nature. 2002 Dec 5;420(6915):520-62. doi: 10.1038/nature01262.

Mapping mutations on phylogenies.在系统发育树上定位突变

Syst Biol. 2002 Oct;51(5):729-39. doi: 10.1080/10635150290102393.

An expectation maximization algorithm for training hidden substitution models.一种用于训练隐式替换模型的期望最大化算法。

J Mol Biol. 2002 Apr 12;317(5):753-64. doi: 10.1006/jmbi.2002.5405.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。