Centre for Data Linkage, Curtin University, Perth, Western Australia.
BMC Med Inform Decis Mak. 2014 Mar 31;14:23. doi: 10.1186/1472-6947-14-23.
Record linkage techniques are widely used to enable health researchers to gain event based longitudinal information for entire populations. The task of record linkage is increasingly being undertaken by specialised linkage units (SLUs). In addition to the complexity of undertaking probabilistic record linkage, these units face additional technical challenges in providing record linkage 'as a service' for research. The extent of this functionality, and approaches to solving these issues, has had little focus in the record linkage literature. Few, if any, of the record linkage packages or systems currently used by SLUs include the full range of functions required.
This paper identifies and discusses some of the functions that are required or undertaken by SLUs in the provision of record linkage services. These include managing routine, on-going linkage; storing and handling changing data; handling different linkage scenarios; accommodating ever increasing datasets. Automated linkage processes are one way of ensuring consistency of results and scalability of service.
Alternative solutions to some of these challenges are presented. By maintaining a full history of links, and storing pairwise information, many of the challenges around handling 'open' records, and providing automated managed extractions are solved. A number of these solutions were implemented as part of the development of the National Linkage System (NLS) by the Centre for Data Linkage (part of the Population Health Research Network) in Australia.
The demand for, and complexity of, linkage services is growing. This presents as a challenge to SLUs as they seek to service the varying needs of dozens of research projects annually. Linkage units need to be both flexible and scalable to meet this demand. It is hoped the solutions presented here can help mitigate these difficulties.
记录链接技术被广泛应用于使健康研究人员能够为整个人群获得基于事件的纵向信息。记录链接的任务越来越多地由专门的链接单位(SLU)承担。除了进行概率性记录链接的复杂性之外,这些单位在提供记录链接“即服务”以进行研究方面还面临着额外的技术挑战。这种功能的程度以及解决这些问题的方法,在记录链接文献中很少受到关注。很少有(如果有的话)SLU 当前使用的记录链接软件包或系统包含所需的全部功能。
本文确定并讨论了 SLU 在提供记录链接服务时需要或承担的一些功能。这些功能包括管理常规的、持续的链接;存储和处理不断变化的数据;处理不同的链接场景;适应不断增加的数据量。自动化链接流程是确保结果一致性和服务可扩展性的一种方法。
提出了一些挑战的替代解决方案。通过维护完整的链接历史记录并存储两两信息,可以解决处理“开放”记录和提供自动化管理提取的许多挑战。这些解决方案中的许多是作为澳大利亚人口健康研究网络的一部分,数据链接中心(National Linkage System,NLS)开发的一部分实施的。
对链接服务的需求和复杂性正在增加。这对 SLU 构成了挑战,因为它们试图满足每年数十个研究项目的各种需求。链接单位需要具有灵活性和可扩展性才能满足这一需求。希望这里提出的解决方案能够帮助缓解这些困难。