Doss Charles R, Suchard Marc A, Holmes Ian, Kato-Maeda Midori, Minin Vladimir N
University of Washington, Seattle.
University of California, Los Angeles.
Ann Appl Stat. 2013;7(4):2315-2335. doi: 10.1214/13-AOAS673.
Continuous-time linear birth-death-immigration (BDI) processes are frequently used in ecology and epidemiology to model stochastic dynamics of the population of interest. In clinical settings, multiple birth-death processes can describe disease trajectories of individual patients, allowing for estimation of the effects of individual covariates on the birth and death rates of the process. Such estimation is usually accomplished by analyzing patient data collected at unevenly spaced time points, referred to as panel data in the biostatistics literature. Fitting linear BDI processes to panel data is a nontrivial optimization problem because birth and death rates can be functions of many parameters related to the covariates of interest. We propose a novel expectation-maximization (EM) algorithm for fitting linear BDI models with covariates to panel data. We derive a closed-form expression for the joint generating function of some of the BDI process statistics and use this generating function to reduce the E-step of the EM algorithm, as well as calculation of the Fisher information, to one-dimensional integration. This analytical technique yields a computationally efficient and robust optimization algorithm that we implemented in an open-source R package. We apply our method to DNA fingerprinting of , the causative agent of tuberculosis, to study intrapatient time evolution of IS copy number, a genetic marker frequently used during estimation of epidemiological clusters of infections. Our analysis reveals previously undocumented differences in IS birth-death rates among three major lineages of , which has important implications for epidemiologists that use IS for DNA fingerprinting of .
连续时间线性出生-死亡-迁入(BDI)过程在生态学和流行病学中经常用于对感兴趣种群的随机动态进行建模。在临床环境中,多个出生-死亡过程可以描述个体患者的疾病轨迹,从而能够估计个体协变量对该过程的出生率和死亡率的影响。这种估计通常是通过分析在不均匀间隔时间点收集的患者数据来完成的,在生物统计学文献中称为面板数据。将线性BDI过程拟合到面板数据是一个不平凡的优化问题,因为出生率和死亡率可能是与感兴趣的协变量相关的许多参数的函数。我们提出了一种新颖的期望最大化(EM)算法,用于将具有协变量的线性BDI模型拟合到面板数据。我们推导出了一些BDI过程统计量的联合生成函数的闭式表达式,并使用该生成函数将EM算法的E步以及费舍尔信息的计算简化为一维积分。这种分析技术产生了一种计算高效且稳健的优化算法,我们在一个开源R包中实现了该算法。我们将我们的方法应用于结核病病原体结核分枝杆菌的DNA指纹识别,以研究IS拷贝数的患者体内时间演变,IS拷贝数是在估计结核分枝杆菌感染的流行病学簇时经常使用的一种遗传标记。我们的分析揭示了结核分枝杆菌三个主要谱系之间以前未记录的IS出生率和死亡率差异,这对使用IS进行结核分枝杆菌DNA指纹识别的流行病学家具有重要意义。