Lange Jane M, Hubbard Rebecca A, Inoue Lurdes Y T, Minin Vladimir N
Department of Bioststatistics, University of Washington, Seattle, Washington, U.S.A.
Biostatistics Unit, Group Health Research Institute, Seattle, Washington, U.S.A.
Biometrics. 2015 Mar;71(1):90-101. doi: 10.1111/biom.12252. Epub 2014 Oct 15.
Multistate models are used to characterize individuals' natural histories through diseases with discrete states. Observational data resources based on electronic medical records pose new opportunities for studying such diseases. However, these data consist of observations of the process at discrete sampling times, which may either be pre-scheduled and non-informative, or symptom-driven and informative about an individual's underlying disease status. We have developed a novel joint observation and disease transition model for this setting. The disease process is modeled according to a latent continuous-time Markov chain; and the observation process, according to a Markov-modulated Poisson process with observation rates that depend on the individual's underlying disease status. The disease process is observed at a combination of informative and non-informative sampling times, with possible misclassification error. We demonstrate that the model is computationally tractable and devise an expectation-maximization algorithm for parameter estimation. Using simulated data, we show how estimates from our joint observation and disease transition model lead to less biased and more precise estimates of the disease rate parameters. We apply the model to a study of secondary breast cancer events, utilizing mammography and biopsy records from a sample of women with a history of primary breast cancer.
多状态模型用于通过具有离散状态的疾病来描述个体的自然病史。基于电子病历的观察性数据资源为研究此类疾病带来了新机遇。然而,这些数据由在离散采样时间对过程的观察组成,这些采样时间要么是预先安排的且无信息性,要么是症状驱动的且能提供有关个体潜在疾病状态的信息。针对这种情况,我们开发了一种新颖的联合观察与疾病转变模型。疾病过程根据一个潜在的连续时间马尔可夫链进行建模;而观察过程则根据一个马尔可夫调制泊松过程进行建模,其观察率取决于个体的潜在疾病状态。在有信息和无信息的采样时间组合下观察疾病过程,可能存在错误分类误差。我们证明该模型在计算上易于处理,并设计了一种期望最大化算法用于参数估计。使用模拟数据,我们展示了来自我们的联合观察与疾病转变模型的估计如何导致对疾病率参数的偏差更小且更精确的估计。我们将该模型应用于一项继发性乳腺癌事件的研究,利用了有原发性乳腺癌病史女性样本的乳房X光检查和活检记录。