Hobolth Asger, Stone Eric A
Department of Mathematical Sciences, Aarhus University, Denmark.
Ann Appl Stat. 2009 Sep 1;3(3):1204. doi: 10.1214/09-AOAS247.
Analyses of serially-sampled data often begin with the assumption that the observations represent discrete samples from a latent continuous-time stochastic process. The continuous-time Markov chain (CTMC) is one such generative model whose popularity extends to a variety of disciplines ranging from computational finance to human genetics and genomics. A common theme among these diverse applications is the need to simulate sample paths of a CTMC conditional on realized data that is discretely observed. Here we present a general solution to this sampling problem when the CTMC is defined on a discrete and finite state space. Specifically, we consider the generation of sample paths, including intermediate states and times of transition, from a CTMC whose beginning and ending states are known across a time interval of length T. We first unify the literature through a discussion of the three predominant approaches: (1) modified rejection sampling, (2) direct sampling, and (3) uniformization. We then give analytical results for the complexity and efficiency of each method in terms of the instantaneous transition rate matrix Q of the CTMC, its beginning and ending states, and the length of sampling time T. In doing so, we show that no method dominates the others across all model specifications, and we give explicit proof of which method prevails for any given Q, T, and endpoints. Finally, we introduce and compare three applications of CTMCs to demonstrate the pitfalls of choosing an inefficient sampler.
对连续采样数据的分析通常始于这样一种假设,即观测值代表来自潜在连续时间随机过程的离散样本。连续时间马尔可夫链(CTMC)就是这样一种生成模型,其应用范围广泛,涵盖从计算金融到人类遗传学和基因组学等多个学科。这些不同应用中的一个共同主题是需要根据离散观测到的已实现数据来模拟CTMC的样本路径。在此,当CTMC定义在离散且有限的状态空间上时,我们给出了这个采样问题的通用解决方案。具体而言,我们考虑从一个在长度为T的时间间隔内起始和结束状态已知的CTMC生成样本路径,包括中间状态和转移时间。我们首先通过讨论三种主要方法来统一相关文献:(1)改进的拒绝采样,(2)直接采样,以及(3)均匀化。然后,我们根据CTMC的瞬时转移率矩阵Q、其起始和结束状态以及采样时间T的长度,给出了每种方法的复杂度和效率的分析结果。通过这样做,我们表明在所有模型规格下没有一种方法能主导其他方法,并且我们给出了在任何给定的Q、T和端点情况下哪种方法占优的明确证明。最后,我们介绍并比较CTMC的三种应用,以展示选择低效采样器的陷阱。