Biostatistics Branch, Division of Cancer and Epidemiology, National Cancer Institute, Rockville, Maryland, USA.
Clinical Genetics Branch, Division of Cancer and Epidemiology, National Cancer Institute, Rockville, Maryland, USA.
Stat Med. 2021 Jul 10;40(15):3460-3476. doi: 10.1002/sim.8977. Epub 2021 Apr 12.
Hidden Markov models (HMMs) have been proposed to model the natural history of diseases while accounting for misclassification in state identification. We introduce a discrete time HMM for human papillomavirus (HPV) and cervical precancer/cancer where the hidden and observed state spaces are defined by all possible combinations of HPV, cytology, and colposcopy results. Because the population of women undergoing cervical cancer screening is heterogeneous with respect to sexual behavior, and therefore risk of HPV acquisition and subsequent precancers, we use a mover-stayer mixture model that assumes a proportion of the population will stay in the healthy state and are not subject to disease progression. As each state is a combination of three distinct tests that characterize the cervix, partially observed data arise when at least one but not every test is observed. The standard forward-backward algorithm, used for evaluating the E-step within the E-M algorithm for maximum-likelihood estimation of HMMs, cannot incorporate time points with partially observed data. We propose a new forward-backward algorithm that considers all possible fully observed states that could have occurred across a participant's follow-up visits. We apply our method to data from a large management trial for women with low-grade cervical abnormalities. Our simulation study found that our method has relatively little bias and out preforms simpler methods that resulted in larger bias.
隐马尔可夫模型 (HMM) 已被提出用于在状态识别中考虑分类错误的情况下对疾病的自然史进行建模。我们引入了一种用于人乳头瘤病毒 (HPV) 和宫颈癌前病变/癌症的离散时间 HMM,其中隐藏状态和观察状态空间由 HPV、细胞学和阴道镜检查结果的所有可能组合定义。由于接受宫颈癌筛查的女性群体在性行为方面存在异质性,因此 HPV 感染和随后的癌前病变的风险也不同,我们使用了一个移动者-停留者混合模型,假设一部分人群将保持健康状态,不会发生疾病进展。由于每个状态都是三个不同测试的组合,这些测试描述了宫颈的状态,因此当至少有一个但不是所有测试都被观察到时,就会出现部分观察数据。标准的前向后向算法用于评估 E-M 算法中最大似然估计 HMM 的 E 步,无法包含部分观察数据的时间点。我们提出了一种新的前向后向算法,该算法考虑了在参与者的随访过程中可能发生的所有完全观察到的状态。我们将我们的方法应用于来自一个大型管理试验的低级别宫颈异常女性的数据。我们的模拟研究发现,我们的方法具有相对较小的偏差,并且表现优于导致更大偏差的简单方法。