在存在缺失链接数据的情况下提高记录链接性能。

Improving record linkage performance in the presence of missing linkage data.

作者信息

Ong Toan C, Mannino Michael V, Schilling Lisa M, Kahn Michael G

机构信息

University of Colorado, Denver, Business School, Denver, CO, USA; Department of Medicine, School of Medicine, University of Colorado, Anschutz Medical Campus, Aurora, CO, USA; Colorado Clinical and Translational Sciences Institute, University of Colorado, Anschutz Medical Campus, Aurora, CO, USA.

University of Colorado, Denver, Business School, Denver, CO, USA.

出版信息

J Biomed Inform. 2014 Dec;52:43-54. doi: 10.1016/j.jbi.2014.01.016. Epub 2014 Feb 10.

DOI:10.1016/j.jbi.2014.01.016

PMID:24524889

Abstract

INTRODUCTION

Existing record linkage methods do not handle missing linking field values in an efficient and effective manner. The objective of this study is to investigate three novel methods for improving the accuracy and efficiency of record linkage when record linkage fields have missing values.

METHODS

By extending the Fellegi-Sunter scoring implementations available in the open-source Fine-grained Record Linkage (FRIL) software system we developed three novel methods to solve the missing data problem in record linkage, which we refer to as: Weight Redistribution, Distance Imputation, and Linkage Expansion. Weight Redistribution removes fields with missing data from the set of quasi-identifiers and redistributes the weight from the missing attribute based on relative proportions across the remaining available linkage fields. Distance Imputation imputes the distance between the missing data fields rather than imputing the missing data value. Linkage Expansion adds previously considered non-linkage fields to the linkage field set to compensate for the missing information in a linkage field. We tested the linkage methods using simulated data sets with varying field value corruption rates.

RESULTS

The methods developed had sensitivity ranging from .895 to .992 and positive predictive values (PPV) ranging from .865 to 1 in data sets with low corruption rates. Increased corruption rates lead to decreased sensitivity for all methods.

CONCLUSIONS

These new record linkage algorithms show promise in terms of accuracy and efficiency and may be valuable for combining large data sets at the patient level to support biomedical and clinical research.

摘要

引言

现有的记录链接方法无法高效且有效地处理缺失的链接字段值。本研究的目的是探索三种新方法，以提高在记录链接字段存在缺失值时记录链接的准确性和效率。

方法

通过扩展开源细粒度记录链接（FRIL）软件系统中可用的费勒吉 - 桑特计分实现方式，我们开发了三种新方法来解决记录链接中的缺失数据问题，我们将其称为：权重重新分配、距离插补和链接扩展。权重重新分配从准标识符集合中移除具有缺失数据的字段，并根据其余可用链接字段的相对比例重新分配缺失属性的权重。距离插补对缺失数据字段之间的距离进行插补，而不是插补缺失数据值。链接扩展将先前视为非链接字段的字段添加到链接字段集中，以补偿链接字段中的缺失信息。我们使用具有不同字段值损坏率的模拟数据集测试了这些链接方法。

结果

在低损坏率的数据集中，所开发的方法灵敏度范围为0.895至0.992，阳性预测值（PPV）范围为0.865至1。所有方法的损坏率增加都会导致灵敏度降低。

结论

这些新的记录链接算法在准确性和效率方面显示出前景，对于在患者层面合并大型数据集以支持生物医学和临床研究可能具有重要价值。

相似文献

Improving record linkage performance in the presence of missing linkage data.

J Biomed Inform. 2014 Dec;52:43-54. doi: 10.1016/j.jbi.2014.01.016. Epub 2014 Feb 10.

A new computationally efficient algorithm for record linkage with field dependency and missing data imputation.

Int J Med Inform. 2018 Jan;109:70-75. doi: 10.1016/j.ijmedinf.2017.10.021. Epub 2017 Nov 6.

The Data-Adaptive Fellegi-Sunter Model for Probabilistic Record Linkage: Algorithm Development and Validation for Incorporating Missing Data and Field Selection.

J Med Internet Res. 2022 Sep 29;24(9):e33775. doi: 10.2196/33775.

Extending the Fellegi-Sunter probabilistic record linkage method for approximate field comparators.

J Biomed Inform. 2010 Feb;43(1):24-30. doi: 10.1016/j.jbi.2009.08.004. Epub 2009 Aug 13.

Variable selection for latent class analysis in the presence of missing data with application to record linkage.

Stat Methods Med Res. 2024 Jun;33(6):966-980. doi: 10.1177/09622802241242317. Epub 2024 Apr 9.

FRIL: A tool for comparative record linkage.

AMIA Annu Symp Proc. 2008 Nov 6;2008:440-4.

Comparing record linkage software programs and algorithms using real-world data.

PLoS One. 2019 Sep 24;14(9):e0221459. doi: 10.1371/journal.pone.0221459. eCollection 2019.

When to conduct probabilistic linkage vs. deterministic linkage? A simulation study.

J Biomed Inform. 2015 Aug;56:80-6. doi: 10.1016/j.jbi.2015.05.012. Epub 2015 May 22.

Controlling false match rates in record linkage using extreme value theory.

J Biomed Inform. 2011 Aug;44(4):648-54. doi: 10.1016/j.jbi.2011.02.008. Epub 2011 Feb 23.

The promise of record linkage for assessing the uptake of health services in resource constrained settings: a pilot study from South Africa.

BMC Med Res Methodol. 2014 May 24;14:71. doi: 10.1186/1471-2288-14-71.

引用本文的文献

The missing link: Electronic health record linkage across species offers opportunities for improving One Health.

medRxiv. 2025 Mar 26:2025.03.25.25324490. doi: 10.1101/2025.03.25.25324490.

First malaria in pregnancy followed in Philippine real-world setting: proof-of-concept of probabilistic record linkage between disease surveillance and hospital administrative data.

Trop Med Health. 2024 Feb 8;52(1):17. doi: 10.1186/s41182-024-00583-7.

A broadly applicable approach to enrich electronic-health-record cohorts by identifying patients with complete data: a multisite evaluation.

J Am Med Inform Assoc. 2023 Nov 17;30(12):1985-1994. doi: 10.1093/jamia/ocad166.

Completeness and Factors Affecting Community Workers' Reporting of Births and Deaths in the Countrywide Mortality Surveillance for Action in Mozambique.

Am J Trop Med Hyg. 2023 Apr 10;108(5_Suppl):29-39. doi: 10.4269/ajtmh.22-0537. Print 2023 May 2.

Implementation and validation of a probabilistic linkage method for population databases without identification variables.

Heliyon. 2022 Dec 14;8(12):e12311. doi: 10.1016/j.heliyon.2022.e12311. eCollection 2022 Dec.

A simple two-step procedure using the Fellegi-Sunter model for frequency-based record linkage.

J Appl Stat. 2021 May 4;49(11):2789-2804. doi: 10.1080/02664763.2021.1922615. eCollection 2022.

CIDACS-RL: a novel indexing search and scoring-based record linkage system for huge datasets with high accuracy and scalability.

BMC Med Inform Decis Mak. 2020 Nov 9;20(1):289. doi: 10.1186/s12911-020-01285-w.

An Introduction to Probabilistic Record Linkage with a Focus on Linkage Processing for WTC Registries.

Int J Environ Res Public Health. 2020 Sep 22;17(18):6937. doi: 10.3390/ijerph17186937.

A hybrid approach to record linkage using a combination of deterministic and probabilistic methodology.

J Am Med Inform Assoc. 2020 Apr 1;27(4):505-513. doi: 10.1093/jamia/ocz232.

Estimating parameters for probabilistic linkage of privacy-preserved datasets.

BMC Med Res Methodol. 2017 Jul 10;17(1):95. doi: 10.1186/s12874-017-0370-0.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

在存在缺失链接数据的情况下提高记录链接性能。

Improving record linkage performance in the presence of missing linkage data.

作者信息

机构信息

出版信息

INTRODUCTION

METHODS

RESULTS

CONCLUSIONS

引言

方法

结果

结论

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献