Clinical Practice Research Datalink (CPRD), MHRA, 10 South Colonnade, Canary Wharf, London, E14 4PU, UK.
NHS Digital, 1 Trevelyan Square, Boar Lane, Leeds, LS1 6AE, UK.
Eur J Epidemiol. 2019 Jan;34(1):91-99. doi: 10.1007/s10654-018-0442-4. Epub 2018 Sep 15.
Record linkage is increasingly used to expand the information available for public health research. An understanding of record linkage methods and the relevant strengths and limitations is important for robust analysis and interpretation of linked data. Here, we describe the approach used by Clinical Practice Research Datalink (CPRD) to link primary care data to other patient level datasets, and the potential implications of this approach for CPRD data analysis. General practice electronic health record software providers separately submit de-identified data to CPRD and patient identifiers to NHS Digital, excluding patients who have opted-out from contributing data. Data custodians for external datasets also send patient identifiers to NHS Digital. NHS Digital uses identifiers to link the datasets using an 8-stage deterministic methodology. CPRD subsequently receives a de-identified linked cohort file and provides researchers with anonymised linked data and metadata detailing the linkage process. This methodology has been used to generate routine primary care linked datasets, including data from Hospital Episode Statistics, Office for National Statistics and National Cancer Registration and Analysis Service. 10.6 million (M) patients from 411 English general practices were included in record linkage in June 2018. 9.1M (86%) patients were of research quality, of which 8.0M (88%) had a valid NHS number and were eligible for linkage in the CPRD standard linked dataset release. Linking CPRD data to other sources improves the range and validity of research studies. This manuscript, together with metadata generated on match strength and linkage eligibility, can be used to inform study design and explore potential linkage-related selection and misclassification biases.
记录链接越来越多地用于扩展公共卫生研究可用的信息。了解记录链接方法以及相关的优势和局限性对于对链接数据进行稳健的分析和解释非常重要。在这里,我们描述了临床实践研究数据链(CPRD)用于将初级保健数据链接到其他患者水平数据集的方法,以及这种方法对 CPRD 数据分析的潜在影响。一般实践电子健康记录软件提供商分别向 CPRD 和 NHS Digital 提交去识别数据和患者标识符,排除选择不提供数据的患者。外部数据集的数据保管人也向 NHS Digital 发送患者标识符。NHS Digital 使用标识符使用 8 阶段确定性方法来链接数据集。CPRD 随后收到一个去识别的链接队列文件,并为研究人员提供匿名链接数据和详细说明链接过程的元数据。这种方法已用于生成常规的初级保健链接数据集,包括来自医院事件统计数据、国家统计局和国家癌症登记和分析服务的数据。2018 年 6 月,来自 411 家英国普通实践的 1060 万(M)名患者参与了记录链接。910 万(86%)名患者具有研究质量,其中 800 万(88%)名患者具有有效的 NHS 号码,并且有资格在 CPRD 标准链接数据集版本中链接。将 CPRD 数据与其他来源链接可以提高研究的范围和有效性。本文档与有关匹配强度和链接资格的元数据一起,可以用于告知研究设计并探索潜在的链接相关选择和分类错误偏差。