Murtha Cancer Center/Research Program, Uniformed Services University of the Health Sciences and Walter Reed National Military Medical Center, Bethesda, MD.
Henry M. Jackson Foundation for the Advancement of Military Medicine, Bethesda, MD.
JCO Clin Cancer Inform. 2020 Oct;4:906-917. doi: 10.1200/CCI.20.00043.
Linked cancer registry and medical claims data have increased the capacity for cancer research. However, few efforts have described methods to select information between data sources, which may affect data use. We developed a systematic process to evaluate and consolidate cancer diagnosis and treatment information between the linked Department of Defense Central Cancer Registry (CCR) and Military Health System Data Repository (MDR) administrative claims database, called Military Cancer Epidemiology Data System (MilCanEpi).
MilCanEpi contains information on cancer diagnosis and treatment of patients receiving care from 1998 to 2014. We used an iterative process guided by knowledge of data features, current literature, and logical comparisons between the CCR and MDR data to evaluate and consolidate cancer diagnosis and treatment received (yes or no) and their dates. We applied the processes to breast cancer data as an example. Agreement between diagnosis and treatment dates in the two data sources was evaluated using Cohen's κ with 95% CIs.
In MilCanEpi, we identified 15,965 patients with a breast cancer diagnosis and 15,145 patients who underwent breast cancer surgery; 97.9% and 84.1% of patients had records in both CCR and MDR for diagnosis and surgery, respectively. Exact agreement was 13.7% for diagnosis dates (Cohen's κ = 0.14; 95% CI, 0.13 to 0.14) and 68.9% for surgery dates (Cohen's κ = 0.69; 95% CI, 0.68 to 0.70) between the two data sources. After applying systematic processes, 98.1% of patients with a breast cancer diagnosis and 99.7% of patients with surgery had information selected for analytic data sets.
The developed processes resulted in high consolidation rates of breast cancer data in MilCanEpi and may serve as a data selection template for other tumor sites and linked data sources.
癌症登记处和医疗索赔数据的关联提高了癌症研究的能力。然而,很少有研究描述在数据源之间选择信息的方法,这可能会影响数据的使用。我们开发了一种系统的方法来评估和整合关联的国防部中央癌症登记处(CCR)和军事健康系统数据存储库(MDR)行政索赔数据库之间的癌症诊断和治疗信息,称为军事癌症流行病学数据系统(MilCanEpi)。
MilCanEpi 包含了 1998 年至 2014 年期间接受治疗的患者的癌症诊断和治疗信息。我们使用一种迭代过程,该过程由对数据特征、现有文献以及 CCR 和 MDR 数据之间的逻辑比较的了解来指导,以评估和整合癌症诊断和治疗的接受情况(是或否)及其日期。我们将该过程应用于乳腺癌数据作为示例。使用 Cohen's κ (95%置信区间)评估了两个数据源中诊断和治疗日期之间的一致性。
在 MilCanEpi 中,我们确定了 15965 名患有乳腺癌的患者和 15145 名接受乳腺癌手术的患者;分别有 97.9%和 84.1%的患者在 CCR 和 MDR 中都有诊断和手术记录。两个数据源的诊断日期完全一致的比例为 13.7%(Cohen's κ = 0.14;95%置信区间,0.13 至 0.14),手术日期完全一致的比例为 68.9%(Cohen's κ = 0.69;95%置信区间,0.68 至 0.70)。在应用系统过程后,98.1%的乳腺癌诊断患者和 99.7%的手术患者的信息被选入分析数据集。
开发的过程导致 MilCanEpi 中乳腺癌数据的整合率很高,并且可以作为其他肿瘤部位和关联数据源的数据选择模板。