• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

比较公共卫生行动记录链接的方法:匹配算法验证研究。

Comparing Methods for Record Linkage for Public Health Action: Matching Algorithm Validation Study.

机构信息

Department of Epidemiology, School of Public Health, University of Washington, Seattle, WA, United States.

HIV/STD Program, Public Health-Seattle and King County, Seattle, WA, United States.

出版信息

JMIR Public Health Surveill. 2020 Apr 30;6(2):e15917. doi: 10.2196/15917.

DOI:10.2196/15917
PMID:32352389
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7226047/
Abstract

BACKGROUND

Many public health departments use record linkage between surveillance data and external data sources to inform public health interventions. However, little guidance is available to inform these activities, and many health departments rely on deterministic algorithms that may miss many true matches. In the context of public health action, these missed matches lead to missed opportunities to deliver interventions and may exacerbate existing health inequities.

OBJECTIVE

This study aimed to compare the performance of record linkage algorithms commonly used in public health practice.

METHODS

We compared five deterministic (exact, Stenger, Ocampo 1, Ocampo 2, and Bosh) and two probabilistic record linkage algorithms (fastLink and beta record linkage [BRL]) using simulations and a real-world scenario. We simulated pairs of datasets with varying numbers of errors per record and the number of matching records between the two datasets (ie, overlap). We matched the datasets using each algorithm and calculated their recall (ie, sensitivity, the proportion of true matches identified by the algorithm) and precision (ie, positive predictive value, the proportion of matches identified by the algorithm that were true matches). We estimated the average computation time by performing a match with each algorithm 20 times while varying the size of the datasets being matched. In a real-world scenario, HIV and sexually transmitted disease surveillance data from King County, Washington, were matched to identify people living with HIV who had a syphilis diagnosis in 2017. We calculated the recall and precision of each algorithm compared with a composite standard based on the agreement in matching decisions across all the algorithms and manual review.

RESULTS

In simulations, BRL and fastLink maintained a high recall at nearly all data quality levels, while being comparable with deterministic algorithms in terms of precision. Deterministic algorithms typically failed to identify matches in scenarios with low data quality. All the deterministic algorithms had a shorter average computation time than the probabilistic algorithms. BRL had the slowest overall computation time (14 min when both datasets contained 2000 records). In the real-world scenario, BRL had the lowest trade-off between recall (309/309, 100.0%) and precision (309/312, 99.0%).

CONCLUSIONS

Probabilistic record linkage algorithms maximize the number of true matches identified, reducing gaps in the coverage of interventions and maximizing the reach of public health action.

摘要

背景

许多公共卫生部门使用监测数据与外部数据源之间的记录链接来为公共卫生干预措施提供信息。然而,几乎没有指导来为这些活动提供信息,而且许多卫生部门依赖于可能错过许多真实匹配的确定性算法。在公共卫生行动的背景下,这些错过的匹配会导致错失提供干预措施的机会,并可能加剧现有的健康不平等。

目的

本研究旨在比较公共卫生实践中常用的记录链接算法的性能。

方法

我们使用模拟和现实场景比较了五种确定性(精确、Stenger、Ocampo1、Ocampo2 和 Bosh)和两种概率性记录链接算法(fastLink 和 beta 记录链接[BRL])。我们模拟了具有不同记录错误数量和两个数据集之间匹配记录数量(即重叠)的数据集对。我们使用每个算法匹配数据集,并计算它们的召回率(即灵敏度,算法识别的真实匹配的比例)和精度(即阳性预测值,算法识别的匹配中真实匹配的比例)。我们通过对每个算法执行 20 次匹配,同时改变要匹配的数据集的大小,来估计平均计算时间。在现实场景中,华盛顿州金县的艾滋病毒和性传播疾病监测数据被匹配,以确定在 2017 年患有梅毒诊断的艾滋病毒感染者。我们计算了每个算法与基于所有算法匹配决策一致性的综合标准以及手动审查的召回率和精度。

结果

在模拟中,BRL 和 fastLink 在几乎所有数据质量水平下都保持高召回率,同时在精度方面与确定性算法相当。在数据质量较低的情况下,确定性算法通常无法识别匹配。所有确定性算法的平均计算时间都比概率性算法短。BRL 的整体计算时间最慢(当两个数据集都包含 2000 条记录时,需要 14 分钟)。在现实场景中,BRL 在召回率(309/309,100.0%)和精度(309/312,99.0%)之间的权衡最低。

结论

概率性记录链接算法最大限度地提高了识别的真实匹配数量,减少了干预措施覆盖范围的差距,并最大限度地扩大了公共卫生行动的范围。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f8ff/7226047/903fe3d3fc20/publichealth_v6i2e15917_fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f8ff/7226047/5b5d3fc989cf/publichealth_v6i2e15917_fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f8ff/7226047/8ba94d9a413b/publichealth_v6i2e15917_fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f8ff/7226047/903fe3d3fc20/publichealth_v6i2e15917_fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f8ff/7226047/5b5d3fc989cf/publichealth_v6i2e15917_fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f8ff/7226047/8ba94d9a413b/publichealth_v6i2e15917_fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f8ff/7226047/903fe3d3fc20/publichealth_v6i2e15917_fig3.jpg

相似文献

1
Comparing Methods for Record Linkage for Public Health Action: Matching Algorithm Validation Study.比较公共卫生行动记录链接的方法:匹配算法验证研究。
JMIR Public Health Surveill. 2020 Apr 30;6(2):e15917. doi: 10.2196/15917.
2
Accuracy of Probabilistic Linkage Using the Enhanced Matching System for Public Health and Epidemiological Studies.使用公共卫生与流行病学研究增强匹配系统的概率性链接的准确性
PLoS One. 2015 Aug 24;10(8):e0136179. doi: 10.1371/journal.pone.0136179. eCollection 2015.
3
Linking Electronic Health Record and Trauma Registry Data: Assessing the Value of Probabilistic Linkage.连接电子健康记录与创伤登记数据:评估概率性连接的价值。
Methods Inf Med. 2018 Nov;57(5-06):261-269. doi: 10.1055/s-0039-1681087. Epub 2019 Mar 15.
4
A comparison of accuracy and computational feasibility of two record linkage algorithms in retrieving vital status information from HIV/AIDS patients registered in Brazilian public databases.两种记录链接算法在从巴西公共数据库中检索艾滋病毒/艾滋病患者生命状态信息方面的准确性和计算可行性比较。
Int J Med Inform. 2018 Jun;114:45-51. doi: 10.1016/j.ijmedinf.2018.03.005. Epub 2018 Mar 20.
5
Linking mothers and infants within electronic health records: a comparison of deterministic and probabilistic algorithms.在电子健康记录中关联母婴:确定性算法与概率性算法的比较
Pharmacoepidemiol Drug Saf. 2015 Jan;24(1):45-51. doi: 10.1002/pds.3728. Epub 2014 Nov 18.
6
A benchmark comparison of deterministic and probabilistic methods for defining manual review datasets in duplicate records reconciliation.在重复记录核对中定义人工审核数据集的确定性方法和概率性方法的基准比较。
J Am Med Inform Assoc. 2014 Jan-Feb;21(1):97-104. doi: 10.1136/amiajnl-2013-001744. Epub 2013 May 23.
7
Privacy-Preserving Record Linkage of Deidentified Records Within a Public Health Surveillance System: Evaluation Study.公共卫生监测系统中去识别化记录的隐私保护记录链接:评估研究
J Med Internet Res. 2020 Jun 24;22(6):e16757. doi: 10.2196/16757.
8
Evaluation of record linkage of two large administrative databases in a middle income country: stillbirths and notifications of dengue during pregnancy in Brazil.中等收入国家两个大型行政数据库的记录关联评估:巴西的死产与孕期登革热通报情况
BMC Med Inform Decis Mak. 2017 Jul 17;17(1):108. doi: 10.1186/s12911-017-0506-5.
9
Comparing record linkage software programs and algorithms using real-world data.使用真实世界的数据比较记录链接软件程序和算法。
PLoS One. 2019 Sep 24;14(9):e0221459. doi: 10.1371/journal.pone.0221459. eCollection 2019.
10
Empirical aspects of record linkage across multiple data sets using statistical linkage keys: the experience of the PIAC cohort study.使用统计链接键在多个数据集之间进行记录链接的经验方面:PIAC 队列研究的经验。
BMC Health Serv Res. 2010 Feb 18;10:41. doi: 10.1186/1472-6963-10-41.

引用本文的文献

1
Using the ATra Black Box to Improve Public Health Data Linkages and Analytics in the DC Cohort Longitudinal HIV Study: Viewpoint on the Process and Findings.在华盛顿特区队列纵向艾滋病病毒研究中使用ATra黑匣子改善公共卫生数据关联与分析:对过程和结果的观点
J Med Internet Res. 2025 Aug 14;27:e77119. doi: 10.2196/77119.
2
Evaluating the utility of public-facing jail registers to inform public health practice, Washington state 2023.评估面向公众的监狱登记册对公共卫生实践的效用,华盛顿州,2023年
BMC Public Health. 2025 Aug 8;25(1):2714. doi: 10.1186/s12889-025-23633-z.
3
Cross-Jurisdictional Data Sharing: Use of the ATra Black Box for Deduplicating Cases in the National HIV Surveillance System.

本文引用的文献

1
Improving HIV Surveillance Data by Using the ATra Black Box System to Assist Regional Deduplication Activities.利用 ATra 黑盒系统辅助区域去重活动,改进 HIV 监测数据。
J Acquir Immune Defic Syndr. 2019 Sep 1;82 Suppl 1(Suppl 1):S13-S19. doi: 10.1097/QAI.0000000000002090.
2
Linking HIV and Viral Hepatitis Surveillance Data: Evaluating a Standard, Deterministic Matching Algorithm Using Data From 6 US Health Jurisdictions.将 HIV 和病毒性肝炎监测数据关联起来:使用来自 6 个美国卫生管辖区的数据评估一种标准的、确定性的匹配算法。
Am J Epidemiol. 2018 Nov 1;187(11):2415-2422. doi: 10.1093/aje/kwy161.
3
Integrating HIV Surveillance and Field Services: Data Quality and Care Continuum in King County, Washington, 2010-2015.
跨辖区数据共享:在国家艾滋病毒监测系统中使用ATra黑匣子进行病例去重
Public Health Rep. 2025 Jul 8:333549251318985. doi: 10.1177/00333549251318985.
4
The substance-exposed birthing person-infant/child dyad and health information exchange in the United States.美国接触物质的分娩者-婴儿/儿童二元组与健康信息交流
J Am Med Inform Assoc. 2025 Mar 1;32(3):417-425. doi: 10.1093/jamia/ocae315.
5
Challenges and Opportunities in Big Data Science to Address Health Inequities and Focus the HIV Response.大数据科学应对健康不平等和聚焦艾滋病应对的挑战与机遇。
Curr HIV/AIDS Rep. 2024 Aug;21(4):208-219. doi: 10.1007/s11904-024-00702-3. Epub 2024 Jun 25.
6
Record Linkage for Malaria Deaths Data Recovery and Surveillance in Brazil.巴西疟疾死亡数据恢复与监测的记录链接
Trop Med Infect Dis. 2023 Dec 14;8(12):519. doi: 10.3390/tropicalmed8120519.
7
Changes in Residential Greenspace and Birth Outcomes among Siblings: Differences by Maternal Race.住宅绿地变化与同胞出生结局:按母亲种族的差异。
Int J Environ Res Public Health. 2023 Sep 21;20(18):6790. doi: 10.3390/ijerph20186790.
8
Enhancing Human Biomonitoring Studies through Linkage to Administrative Registers-Status in Europe.通过与行政登记系统关联来加强人体生物监测研究——欧洲的现状。
Int J Environ Res Public Health. 2022 May 6;19(9):5678. doi: 10.3390/ijerph19095678.
9
(Almost) all of entity resolution.(几乎)所有的实体解析。
Sci Adv. 2022 Mar 25;8(12):eabi8021. doi: 10.1126/sciadv.abi8021.
10
Enhancing the ATra Black Box Matching Algorithm: Use of All Names for Deduplication Across Jurisdictions.增强 ATra 黑盒匹配算法:跨司法辖区使用所有名称进行去重。
Public Health Rep. 2023 Jan-Feb;138(1):54-61. doi: 10.1177/00333549211066171. Epub 2022 Jan 21.
整合艾滋病毒监测与现场服务:华盛顿州金县2010 - 2015年的数据质量与护理连续性
Am J Public Health. 2017 Dec;107(12):1938-1943. doi: 10.2105/AJPH.2017.304069. Epub 2017 Oct 19.
4
"Out of Care" HIV Case Investigations: A Collaborative Analysis Across 6 States in the Northwest US.“失管”艾滋病病毒病例调查:美国西北部6个州的协作分析
J Acquir Immune Defic Syndr. 2017 Feb 1;74 Suppl 2(Suppl 2):S81-S87. doi: 10.1097/QAI.0000000000001237.
5
Improving Retention in HIV Care Through New York's Expanded Partner Services Data-to-Care Pilot.通过纽约扩大的性伴服务数据到关怀试点项目提高艾滋病毒护理留存率
J Public Health Manag Pract. 2017 May/Jun;23(3):255-263. doi: 10.1097/PHH.0000000000000483.
6
Detecting Duplicates at Hospital Admission: Comparison of Deterministic and Probabilistic Record Linkage.住院时重复记录的检测:确定性与概率性记录链接的比较
Stud Health Technol Inform. 2016;226:135-8.
7
HIV provider and patient perspectives on the Development of a Health Department "Data to Care" Program: a qualitative study.艾滋病病毒感染者护理提供者与患者对卫生部门“数据促护理”项目发展的看法:一项定性研究
BMC Public Health. 2016 Jun 10;16:491. doi: 10.1186/s12889-016-3152-4.
8
Improving HIV Surveillance Data for Public Health Action in Washington, DC: A Novel Multiorganizational Data-Sharing Method.改善华盛顿特区公共卫生行动的艾滋病毒监测数据:一种新颖的多组织数据共享方法。
JMIR Public Health Surveill. 2016 Jan 15;2(1):e3. doi: 10.2196/publichealth.5317. eCollection 2016 Jan-Jun.
9
Acceptance of the use of HIV surveillance data for care engagement: national and local community perspectives.接受将艾滋病毒监测数据用于护理参与:国家和地方社区的观点。
J Acquir Immune Defic Syndr. 2015 May 1;69 Suppl 1(0 1):S31-6. doi: 10.1097/QAI.0000000000000573.
10
Using HIV surveillance registry data to re-link persons to care: the RSVP Project in San Francisco.利用艾滋病毒监测登记数据重新将患者与医疗服务联系起来:旧金山的RSVP项目。
PLoS One. 2015 Mar 6;10(3):e0118923. doi: 10.1371/journal.pone.0118923. eCollection 2015.