Lyons Ronan A, Jones Kerina H, John Gareth, Brooks Caroline J, Verplancke Jean-Philippe, Ford David V, Brown Ginevra, Leake Ken
Health Information Research Unit, Centre for Health Information Research & Evaluation, School of Medicine, Swansea University, Swansea, Wales, UK.
BMC Med Inform Decis Mak. 2009 Jan 16;9:3. doi: 10.1186/1472-6947-9-3.
Vast amounts of data are collected about patients and service users in the course of health and social care service delivery. Electronic data systems for patient records have the potential to revolutionise service delivery and research. But in order to achieve this, it is essential that the ability to link the data at the individual record level be retained whilst adhering to the principles of information governance. The SAIL (Secure Anonymised Information Linkage) databank has been established using disparate datasets, and over 500 million records from multiple health and social care service providers have been loaded to date, with further growth in progress.
Having established the infrastructure of the databank, the aim of this work was to develop and implement an accurate matching process to enable the assignment of a unique Anonymous Linking Field (ALF) to person-based records to make the databank ready for record-linkage research studies. An SQL-based matching algorithm (MACRAL, Matching Algorithm for Consistent Results in Anonymised Linkage) was developed for this purpose. Firstly the suitability of using a valid NHS number as the basis of a unique identifier was assessed using MACRAL. Secondly, MACRAL was applied in turn to match primary care, secondary care and social services datasets to the NHS Administrative Register (NHSAR), to assess the efficacy of this process, and the optimum matching technique.
The validation of using the NHS number yielded specificity values > 99.8% and sensitivity values > 94.6% using probabilistic record linkage (PRL) at the 50% threshold, and error rates were < 0.2%. A range of techniques for matching datasets to the NHSAR were applied and the optimum technique resulted in sensitivity values of: 99.9% for a GP dataset from primary care, 99.3% for a PEDW dataset from secondary care and 95.2% for the PARIS database from social care.
With the infrastructure that has been put in place, the reliable matching process that has been developed enables an ALF to be consistently allocated to records in the databank. The SAIL databank represents a research-ready platform for record-linkage studies.
在卫生和社会护理服务提供过程中,会收集大量有关患者和服务使用者的数据。用于患者记录的电子数据系统有可能彻底改变服务提供和研究方式。但要实现这一点,在遵循信息治理原则的同时,保留在个体记录层面链接数据的能力至关重要。SAIL(安全匿名信息链接)数据库是利用不同的数据集建立的,迄今为止,已加载了来自多个卫生和社会护理服务提供者的超过5亿条记录,且仍在进一步增长。
在建立了数据库基础设施之后,这项工作的目的是开发并实施一个精确匹配过程,以便为基于个人的记录分配唯一的匿名链接字段(ALF),使数据库为记录链接研究做好准备。为此开发了一种基于SQL的匹配算法(MACRAL,匿名链接中一致结果匹配算法)。首先,使用MACRAL评估将有效的国民保健服务(NHS)号码用作唯一标识符基础的适用性。其次,依次应用MACRAL将初级保健、二级保健和社会服务数据集与NHS行政登记册(NHSAR)进行匹配,以评估该过程的有效性和最佳匹配技术。
使用概率记录链接(PRL)在50%阈值下对使用NHS号码进行验证,特异性值>99.8%,敏感性值>94.6%,错误率<0.2%。应用了一系列将数据集与NHSAR进行匹配的技术,最佳技术产生的敏感性值为:来自初级保健的全科医生(GP)数据集为99.9%,来自二级保健的PEDW数据集为99.3%,来自社会护理的PARIS数据库为95.2%。
凭借已建立的基础设施,所开发的可靠匹配过程能够将ALF一致地分配给数据库中的记录。SAIL数据库代表了一个可供进行记录链接研究的平台。