大型公共卫生数据文件的概率性关联

Probabilistic linkage of large public health data files.

作者信息

Jaro M A

机构信息

Match Ware Technologies, Inc., Silver Spring, MD 20905, USA.

出版信息

Stat Med. 1995;14(5-7):491-8. doi: 10.1002/sim.4780140510.

DOI:10.1002/sim.4780140510

PMID:7792443

Abstract

Probabilistic linkage technology makes it feasible and efficient to link large public health databases in a statistically justifiable manner. The problem addressed by the methodology is that of matching two files of individual data under conditions of uncertainty. Each field is subject to error which is measured by the probability that the field agrees given a record pair matches (called the m probability) and probabilities of chance agreement of its value states (called the u probability). Fellegi and Sunter pioneered record linkage theory. Advances in methodology include use of an EM algorithm for parameter estimation, optimization of matches by means of a linear sum assignment program, and more recently, a probability model that addresses both m and u probabilities for all value states of a field. This provides a means for obtaining greater precision from non-uniformly distributed fields, without the theoretical complications arising from frequency-based matching alone. The model includes an iterative parameter estimation procedure that is more robust than pre-match estimation techniques. The methodology was originally developed and tested by the author at the U.S. Census Bureau for census undercount estimation. The more recent advances and a new generalized software system were tested and validated by linking highway crashes to Emergency Medical Service (EMS) reports and to hospital admission records for the National Highway Traffic Safety Administration (NHTSA).

摘要

概率链接技术使得以统计上合理的方式链接大型公共卫生数据库变得可行且高效。该方法所解决的问题是在不确定条件下匹配两个个人数据文件。每个字段都存在误差，该误差通过给定记录对匹配时字段一致的概率（称为m概率）及其值状态的随机一致概率（称为u概率）来衡量。费勒吉和桑特开创了记录链接理论。方法学上的进展包括使用期望最大化（EM）算法进行参数估计、通过线性和分配程序优化匹配，以及最近提出的一种针对字段所有值状态同时考虑m和u概率的概率模型。这为从不均匀分布的字段中获得更高精度提供了一种方法，而不会出现仅基于频率匹配所产生的理论复杂性。该模型包括一个迭代参数估计程序，它比匹配前的估计技术更稳健。该方法最初由作者在美国人口普查局开发并用于人口普查漏计估计测试。最近的进展以及一个新的通用软件系统通过将高速公路撞车事故与紧急医疗服务（EMS）报告以及美国国家公路交通安全管理局（NHTSA）的医院入院记录相链接进行了测试和验证。

相似文献

Probabilistic linkage of large public health data files.大型公共卫生数据文件的概率性关联

Stat Med. 1995;14(5-7):491-8. doi: 10.1002/sim.4780140510.

Extending the Fellegi-Sunter probabilistic record linkage method for approximate field comparators.扩展 Fellegi-Sunter 概率记录链接方法以用于近似字段比较器。

J Biomed Inform. 2010 Feb;43(1):24-30. doi: 10.1016/j.jbi.2009.08.004. Epub 2009 Aug 13.

A new computationally efficient algorithm for record linkage with field dependency and missing data imputation.一种新的具有字段依赖性和缺失数据插补功能的计算效率高的记录链接算法。

Int J Med Inform. 2018 Jan;109:70-75. doi: 10.1016/j.ijmedinf.2017.10.021. Epub 2017 Nov 6.

Automated linkage of patient records from disparate sources.来自不同来源的患者记录的自动链接。

Stat Methods Med Res. 2018 Jan;27(1):172-184. doi: 10.1177/0962280215626180. Epub 2016 Jul 20.

The Data-Adaptive Fellegi-Sunter Model for Probabilistic Record Linkage: Algorithm Development and Validation for Incorporating Missing Data and Field Selection.数据自适应 Fellegi-Sunter 模型在概率记录链接中的应用：纳入缺失数据和字段选择的算法开发和验证。

J Med Internet Res. 2022 Sep 29;24(9):e33775. doi: 10.2196/33775.

A simple two-step procedure using the Fellegi-Sunter model for frequency-based record linkage.一种使用费勒吉-桑特模型进行基于频率的记录链接的简单两步程序。

J Appl Stat. 2021 May 4;49(11):2789-2804. doi: 10.1080/02664763.2021.1922615. eCollection 2022.

Estimating parameters for probabilistic linkage of privacy-preserved datasets.估算隐私保护数据集概率关联的参数。

BMC Med Res Methodol. 2017 Jul 10;17(1):95. doi: 10.1186/s12874-017-0370-0.

Evaluation of record linkage methods for iterative insertions.迭代插入的记录链接方法评估

Methods Inf Med. 2009;48(5):429-37. doi: 10.3414/ME9238. Epub 2009 Aug 20.

Controlling false match rates in record linkage using extreme value theory.利用极值理论控制记录匹配中的错误匹配率。

J Biomed Inform. 2011 Aug;44(4):648-54. doi: 10.1016/j.jbi.2011.02.008. Epub 2011 Feb 23.

Variable selection for latent class analysis in the presence of missing data with application to record linkage.存在缺失数据时的潜在类别分析的变量选择及其在记录链接中的应用。

Stat Methods Med Res. 2024 Jun;33(6):966-980. doi: 10.1177/09622802241242317. Epub 2024 Apr 9.

引用本文的文献

Distinctively black names and mechanisms of discrimination: Evidence from the early 20th century.独特的黑人姓名与歧视机制：来自20世纪初的证据。

Soc Sci Res. 2025 Feb;126:103136. doi: 10.1016/j.ssresearch.2024.103136. Epub 2024 Dec 21.

Do continuous glucose monitoring (CGM) metrics predict macrovascular and microvascular complications in diabetes? The FACULTY protocol of a retrospective real-world cohort study.持续葡萄糖监测（CGM）指标能否预测糖尿病的大血管和微血管并发症？一项回顾性真实世界队列研究的FACULTY方案。

BMJ Open. 2025 Jan 8;15(1):e085961. doi: 10.1136/bmjopen-2024-085961.

Transition from rehabilitation hospital to the National Disability Insurance Scheme (NDIS) for people with brain injury and spinal cord injury: a data linkage protocol.脑损伤和脊髓损伤患者从康复医院到国家残疾保险计划（NDIS）的过渡：数据链接方案。

BMJ Open. 2024 Aug 19;14(8):e082802. doi: 10.1136/bmjopen-2023-082802.

Changes in Emergency Department Pediatric Readiness and Mortality.急诊儿科准备情况和死亡率的变化。

JAMA Netw Open. 2024 Jul 1;7(7):e2422107. doi: 10.1001/jamanetworkopen.2024.22107.

Timing and causes of death to 1 year among children presenting to emergency departments.儿童就诊于急诊科后 1 年内的死亡时间和原因。

Acad Emerg Med. 2024 Jun;31(6):555-563. doi: 10.1111/acem.14875. Epub 2024 Mar 18.

Healthcare and Cancer Treatment Costs of Breast Screening Outcomes among Higher than Average Risk Women.高于平均风险女性的乳房筛查结果的医疗保健和癌症治疗成本。

Curr Oncol. 2023 Sep 18;30(9):8550-8562. doi: 10.3390/curroncol30090620.

Virtual patient identifier (vPID): Improving patient traceability using anonymized identifiers in Japanese healthcare insurance claims database.虚拟患者标识符（vPID）：在日本医疗保险理赔数据库中使用匿名标识符提高患者可追溯性。

Heliyon. 2023 May 12;9(5):e16209. doi: 10.1016/j.heliyon.2023.e16209. eCollection 2023 May.

Direct medical charges of all parties in teen-involved vehicle crashes by culpability.按事故责任划分的青少年涉及车辆碰撞的各方直接医疗费用。

Inj Prev. 2023 Aug;29(4):334-339. doi: 10.1136/ip-2022-044841. Epub 2023 May 5.

Emergency Department Pediatric Readiness and Short-term and Long-term Mortality Among Children Receiving Emergency Care.急诊儿科准备情况与接受急诊治疗儿童的短期和长期死亡率。

JAMA Netw Open. 2023 Jan 3;6(1):e2250941. doi: 10.1001/jamanetworkopen.2022.50941.

A Privacy-Preserving Distributed Medical Data Integration Security System for Accuracy Assessment of Cancer Screening: Development Study of Novel Data Integration System.一种用于癌症筛查准确性评估的隐私保护分布式医学数据集成安全系统：新型数据集成系统的开发研究

JMIR Med Inform. 2022 Dec 30;10(12):e38922. doi: 10.2196/38922.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

大型公共卫生数据文件的概率性关联

Probabilistic linkage of large public health data files.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献