一种新的具有字段依赖性和缺失数据插补功能的计算效率高的记录链接算法。

A new computationally efficient algorithm for record linkage with field dependency and missing data imputation.

机构信息

Clinical Research Facility, National University of Ireland, Galway, Ireland.

Graduate Entry Medical School, University of Limerick, Ireland.

出版信息

Int J Med Inform. 2018 Jan;109:70-75. doi: 10.1016/j.ijmedinf.2017.10.021. Epub 2017 Nov 6.

DOI:10.1016/j.ijmedinf.2017.10.021

PMID:29195708

Abstract

Record linkage algorithms aim to identify pairs of records that correspond to the same individual from two or more datasets. In general, fields that are common to both datasets are compared to determine which record-pairs to link. The classic model for probabilistic linkage was proposed by Fellegi and Sunter and assumes that individual fields common to both datasets are completely observed, and that the field agreement indicators are conditionally independent within the subsets of record pairs corresponding to the same and differing individuals. Herein, we propose a novel record linkage algorithm that is independent of these two baseline assumptions. We demonstrate improved performance of the algorithm in the presence of missing data and correlation patterns between the agreement indicators. The algorithm is computationally efficient and can be used to link large databases consisting of millions of record pairs. An R-package, corlink, has been developed to implement the new algorithm and can be downloaded from the CRAN repository.

摘要

记录链接算法旨在从两个或多个数据集识别对应于同一个体的记录对。通常，比较两个数据集共有的字段以确定要链接的记录对。Fellegi 和 Sunter 提出了用于概率链接的经典模型，该模型假设两个数据集共有的各个字段都是完全观测到的，并且字段一致性指标在对应于相同和不同个体的记录对子集内是条件独立的。在此，我们提出了一种新的记录链接算法，该算法独立于这两个基本假设。我们证明了在存在缺失数据和一致性指标之间存在相关模式的情况下，该算法的性能得到了提高。该算法计算效率高，可用于链接由数百万条记录对组成的大型数据库。已开发了一个 R 包 corlink 来实现新算法，并可以从 CRAN 存储库下载。

相似文献

A new computationally efficient algorithm for record linkage with field dependency and missing data imputation.

Int J Med Inform. 2018 Jan;109:70-75. doi: 10.1016/j.ijmedinf.2017.10.021. Epub 2017 Nov 6.

The Data-Adaptive Fellegi-Sunter Model for Probabilistic Record Linkage: Algorithm Development and Validation for Incorporating Missing Data and Field Selection.

J Med Internet Res. 2022 Sep 29;24(9):e33775. doi: 10.2196/33775.

Improving record linkage performance in the presence of missing linkage data.

J Biomed Inform. 2014 Dec;52:43-54. doi: 10.1016/j.jbi.2014.01.016. Epub 2014 Feb 10.

Extending the Fellegi-Sunter probabilistic record linkage method for approximate field comparators.

J Biomed Inform. 2010 Feb;43(1):24-30. doi: 10.1016/j.jbi.2009.08.004. Epub 2009 Aug 13.

Automated linkage of patient records from disparate sources.

Stat Methods Med Res. 2018 Jan;27(1):172-184. doi: 10.1177/0962280215626180. Epub 2016 Jul 20.

Variable selection for latent class analysis in the presence of missing data with application to record linkage.

Stat Methods Med Res. 2024 Jun;33(6):966-980. doi: 10.1177/09622802241242317. Epub 2024 Apr 9.

Probabilistic linkage of large public health data files.

Stat Med. 1995;14(5-7):491-8. doi: 10.1002/sim.4780140510.

Comparing record linkage software programs and algorithms using real-world data.

PLoS One. 2019 Sep 24;14(9):e0221459. doi: 10.1371/journal.pone.0221459. eCollection 2019.

Controlling false match rates in record linkage using extreme value theory.

J Biomed Inform. 2011 Aug;44(4):648-54. doi: 10.1016/j.jbi.2011.02.008. Epub 2011 Feb 23.

Evaluation of record linkage methods for iterative insertions.

Methods Inf Med. 2009;48(5):429-37. doi: 10.3414/ME9238. Epub 2009 Aug 20.

引用本文的文献

Prevalence of anaemia, iron, and vitamin deficiencies in the health system in the Republic of Ireland: a retrospective cohort study.

BJGP Open. 2024 Jul 29;8(2). doi: 10.3399/BJGPO.2023.0126. Print 2024 Jul.

Administrative records-based criterion measures.

Mil Psychol. 2023 Jul-Aug;35(4):351-363. doi: 10.1080/08995605.2022.2063614. Epub 2022 May 31.

Temporal trends in acute kidney injury across health care settings in the Irish health system: a cohort study.

Nephrol Dial Transplant. 2020 Mar 1;35(3):447-457. doi: 10.1093/ndt/gfy226.

Temporal trends in hyperuricaemia in the Irish health system from 2006-2014: A cohort study.

PLoS One. 2018 May 31;13(5):e0198197. doi: 10.1371/journal.pone.0198197. eCollection 2018.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

一种新的具有字段依赖性和缺失数据插补功能的计算效率高的记录链接算法。

A new computationally efficient algorithm for record linkage with field dependency and missing data imputation.

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献