Biometric Research Program, Division of Cancer Treatment and Diagnosis, National Cancer Institute, 9609 Medical Center Drive, Bethesda, MD 20892, USA.
Biostatistics Branch, Division of Cancer Epidemiology and Genetics, National Cancer Institute, 9609 Medical Center Drive, Bethesda, MD 20892, USA.
Biostatistics. 2023 Dec 15;25(1):117-133. doi: 10.1093/biostatistics/kxac049.
Disease incidence data in a national-based cohort study would ideally be obtained through a national disease registry. Unfortunately, no such registry currently exists in the United States. Instead, the results from individual state registries need to be combined to ascertain certain disease diagnoses in the United States. The National Cancer Institute has initiated a program to assemble all state registries to provide a complete assessment of all cancers in the United States. Unfortunately, not all registries have agreed to participate. In this article, we develop an imputation-based approach that uses self-reported cancer diagnosis from longitudinally collected questionnaires to impute cancer incidence not covered by the combined registry. We propose a two-step procedure, where in the first step a mover-stayer model is used to impute a participant's registry coverage status when it is only reported at the time of the questionnaires given at 10-year intervals and the time of the last-alive vital status and death. In the second step, we propose a semiparametric working model, fit using an imputed coverage area sample identified from the mover-stayer model, to impute registry-based survival outcomes for participants in areas not covered by the registry. The simulation studies show the approach performs well as compared with alternative ad hoc approaches for dealing with this problem. We illustrate the methodology with an analysis that links the United States Radiologic Technologists study cohort with the combined registry that includes 32 of the 50 states.
在基于全国性队列研究中的疾病发病率数据最好是通过全国性疾病登记处获得。不幸的是,目前在美国还没有这样的登记处。相反,需要将来自各个州登记处的结果合并,以确定美国的某些疾病诊断。美国国家癌症研究所已经启动了一项计划,将所有的州登记处汇集起来,以全面评估美国所有的癌症。不幸的是,并非所有的登记处都同意参与。在本文中,我们提出了一种基于插补的方法,该方法使用从纵向收集的问卷中自我报告的癌症诊断来插补组合登记处未涵盖的癌症发病率。我们提出了一个两步程序,在第一步中,使用移动者-停留者模型来插补参与者的登记覆盖状态,当仅在每隔 10 年的问卷报告时以及最后存活的生命状态和死亡时报告时。在第二步中,我们提出了一个半参数工作模型,使用从移动者-停留者模型中确定的插补覆盖区域样本进行拟合,以插补登记处未覆盖的参与者的登记处生存结果。模拟研究表明,该方法与处理该问题的替代临时方法相比表现良好。我们通过将美国放射技师研究队列与包括 50 个州中的 32 个州的组合登记处进行链接的分析来说明该方法。