Hagberg Katrina Wilcox, Vasilakis-Scaramozza Catherine, Persson Rebecca, Neasham David, Kafatos George, Jick Susan
Epidemiology, Boston Collaborative Drug Surveillance Program, Lexington, MA, USA.
Center for Observational Research, Amgen Ltd, Uxbridge, UK.
Clin Epidemiol. 2023 Dec 16;15:1193-1206. doi: 10.2147/CLEP.S434829. eCollection 2023.
To evaluate the new Clinical Practice Research Datalink (CPRD) Aurum database, we estimated 'correctness' (ie accuracy, validity) and 'completeness' (ie presence, missingness) of malignant breast cancer diagnoses recorded in CPRD Aurum compared to external linked data sources: Hospital Episode Statistics (HES) Admitted Patient Care (APC), HES Outpatient (OP), and Cancer Registry (CR), and to the previously validated CPRD GOLD.
Linkage-eligible, female patients with incident malignant breast cancer diagnosis recorded in at least one study data source were selected. Correctness was the proportion of malignant breast cancer cases recorded in CPRD Aurum or GOLD who also had a diagnosis recorded in HES APC/OP (2004-2019) or CR (2004-2016). Completeness was estimated by identifying all malignant breast cancer diagnoses in HES APC/OP or CR and calculating the proportion with a concordant diagnosis in CPRD Aurum or GOLD.
Compared to HES APC/OP, there were 85,659 and 31,452 eligible patients in CPRD Aurum and GOLD, respectively. Correctness estimates were high (CPRD Aurum 83.5%, GOLD 81.7%). Compared to CR, there were 70,190 and 29,597 eligible patients in CPRD Aurum and GOLD, respectively: correctness was 89.1% for CPRD Aurum and 88.2% for GOLD. Completeness estimates for CPRD Aurum and GOLD were high (>90%). Diagnoses were recorded in CPRD Aurum within -7 to 74 days of those in the linked sources. Reasons for discordant diagnostic coding included presence of treatment or other clinical codes only, diagnosis coded after end of follow-up, non-malignant breast cancer in linked data, and administrative codes in lieu of diagnostic codes.
These results indicate that correctness and completeness of malignant breast cancer diagnoses in CPRD Aurum were high and similar to CPRD GOLD. This provides confidence in use of CPRD Aurum for research purposes. Where complete case capture is important, researchers should consider linkage to HES APC or CR.
为评估新的临床实践研究数据链(CPRD)金数据库,我们将CPRD金数据库中记录的恶性乳腺癌诊断的“正确性”(即准确性、有效性)和“完整性”(即存在性、缺失性)与外部关联数据源:医院事件统计(HES)住院患者护理(APC)、HES门诊(OP)和癌症登记处(CR),以及先前经过验证的CPRD GOLD进行了比较。
选择在至少一个研究数据源中记录有新发恶性乳腺癌诊断的符合关联条件的女性患者。正确性是指在CPRD金数据库或GOLD中记录的恶性乳腺癌病例中,同时在HES APC/OP(2004 - 2019年)或CR(2004 - 2016年)中有诊断记录的比例。通过识别HES APC/OP或CR中的所有恶性乳腺癌诊断,并计算在CPRD金数据库或GOLD中有一致诊断的比例来估计完整性。
与HES APC/OP相比,CPRD金数据库和GOLD中分别有85,659名和31,452名符合条件的患者。正确性估计值较高(CPRD金数据库为83.5%,GOLD为81.7%)。与CR相比,CPRD金数据库和GOLD中分别有70,190名和29,597名符合条件的患者:CPRD金数据库的正确性为89.1%,GOLD为88.2%。CPRD金数据库和GOLD的完整性估计值较高(>90%)。CPRD金数据库中的诊断记录时间比关联数据源中的诊断记录时间早7天至晚74天。诊断编码不一致的原因包括仅存在治疗或其他临床编码、随访结束后编码的诊断、关联数据中的非恶性乳腺癌以及代替诊断编码的行政编码。
这些结果表明,CPRD金数据库中恶性乳腺癌诊断的正确性和完整性较高,与CPRD GOLD相似。这为将CPRD金数据库用于研究目的提供了信心。在完整病例捕获很重要的情况下,研究人员应考虑与HES APC或CR进行关联。