• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

扩展 Fellegi-Sunter 概率记录链接方法以用于近似字段比较器。

Extending the Fellegi-Sunter probabilistic record linkage method for approximate field comparators.

机构信息

Department of Biomedical Informatics, University of Utah, Utah, USA.

出版信息

J Biomed Inform. 2010 Feb;43(1):24-30. doi: 10.1016/j.jbi.2009.08.004. Epub 2009 Aug 13.

DOI:10.1016/j.jbi.2009.08.004
PMID:19683070
Abstract

Probabilistic record linkage is a method commonly used to determine whether demographic records refer to the same person. The Fellegi-Sunter method is a probabilistic approach that uses field weights based on log likelihood ratios to determine record similarity. This paper introduces an extension of the Fellegi-Sunter method that incorporates approximate field comparators in the calculation of field weights. The data warehouse of a large academic medical center was used as a case study. The approximate comparator extension was compared with the Fellegi-Sunter method in its ability to find duplicate records previously identified in the data warehouse using different demographic fields and matching cutoffs. The approximate comparator extension misclassified 25% fewer pairs and had a larger Welch's T statistic than the Fellegi-Sunter method for all field sets and matching cutoffs. The accuracy gain provided by the approximate comparator extension grew as less information was provided and as the matching cutoff increased. Given the ubiquity of linkage in both clinical and research settings, the incremental improvement of the extension has the potential to make a considerable impact.

摘要

概率记录链接是一种常用于确定人口统计学记录是否指的是同一个人的方法。费莱吉-桑特方法是一种概率方法,它使用基于对数似然比的字段权重来确定记录的相似性。本文介绍了一种费莱吉-桑特方法的扩展,该方法在计算字段权重时采用了近似字段比较器。以一个大型学术医疗中心的数据仓库为例进行研究。将近似比较器扩展与费莱吉-桑特方法进行比较,以确定使用不同人口统计学字段和匹配截止值在数据仓库中先前识别的重复记录。对于所有字段集和匹配截止值,近似比较器扩展错误分类的对少 25%,并且韦尔奇 T 统计量大于费莱吉-桑特方法。随着提供的信息量减少和匹配截止值增加,近似比较器扩展提供的准确性增益也在增加。鉴于链接在临床和研究环境中的普遍性,扩展的增量改进有可能产生相当大的影响。

相似文献

1
Extending the Fellegi-Sunter probabilistic record linkage method for approximate field comparators.扩展 Fellegi-Sunter 概率记录链接方法以用于近似字段比较器。
J Biomed Inform. 2010 Feb;43(1):24-30. doi: 10.1016/j.jbi.2009.08.004. Epub 2009 Aug 13.
2
Real world performance of approximate string comparators for use in patient matching.用于患者匹配的近似字符串比较器的实际性能。
Stud Health Technol Inform. 2004;107(Pt 1):43-7.
3
The Data-Adaptive Fellegi-Sunter Model for Probabilistic Record Linkage: Algorithm Development and Validation for Incorporating Missing Data and Field Selection.数据自适应 Fellegi-Sunter 模型在概率记录链接中的应用:纳入缺失数据和字段选择的算法开发和验证。
J Med Internet Res. 2022 Sep 29;24(9):e33775. doi: 10.2196/33775.
4
Some methods for blindfolded record linkage.一些用于盲态记录链接的方法。
BMC Med Inform Decis Mak. 2004 Jun 28;4:9. doi: 10.1186/1472-6947-4-9.
5
A new computationally efficient algorithm for record linkage with field dependency and missing data imputation.一种新的具有字段依赖性和缺失数据插补功能的计算效率高的记录链接算法。
Int J Med Inform. 2018 Jan;109:70-75. doi: 10.1016/j.ijmedinf.2017.10.021. Epub 2017 Nov 6.
6
Issues in identification and linkage of patient records across an integrated delivery system.综合医疗服务体系中患者记录的识别与关联问题。
J Healthc Inf Manag. 1998 Fall;12(3):43-52.
7
The impact of a growing minority population on identification of duplicate records in an enterprise data warehouse.少数族裔人口增长对企业数据仓库中重复记录识别的影响。
Stud Health Technol Inform. 2010;160(Pt 2):1122-6.
8
A simple two-step procedure using the Fellegi-Sunter model for frequency-based record linkage.一种使用费勒吉-桑特模型进行基于频率的记录链接的简单两步程序。
J Appl Stat. 2021 May 4;49(11):2789-2804. doi: 10.1080/02664763.2021.1922615. eCollection 2022.
9
Automated linkage of patient records from disparate sources.来自不同来源的患者记录的自动链接。
Stat Methods Med Res. 2018 Jan;27(1):172-184. doi: 10.1177/0962280215626180. Epub 2016 Jul 20.
10
Probabilistic linkage of computerized ambulance and inpatient hospital discharge records: a potential tool for evaluation of emergency medical services.计算机化救护车与住院患者出院记录的概率关联:一种评估紧急医疗服务的潜在工具。
Ann Emerg Med. 2001 Jun;37(6):616-26. doi: 10.1067/mem.2001.115214.

引用本文的文献

1
De-identified Bayesian personal identity matching for privacy-preserving record linkage despite errors: development and validation.去标识化贝叶斯个人身份匹配用于隐私保护记录链接,即使存在错误:开发和验证。
BMC Med Inform Decis Mak. 2023 May 5;23(1):85. doi: 10.1186/s12911-023-02176-6.
2
Implementation and validation of a probabilistic linkage method for population databases without identification variables.针对无识别变量的人群数据库的概率性关联方法的实施与验证
Heliyon. 2022 Dec 14;8(12):e12311. doi: 10.1016/j.heliyon.2022.e12311. eCollection 2022 Dec.
3
Optimizing the Retrieval of the Vital Status of Cancer Patients for Health Data Warehouses by Using Open Government Data in France.
利用法国开放政府数据优化健康数据仓库中癌症患者生命状态的检索
Int J Environ Res Public Health. 2022 Apr 2;19(7):4272. doi: 10.3390/ijerph19074272.
4
Data Centre Profile: The Provincial Health Data Centre of the Western Cape Province, South Africa.数据中心简介:南非西开普省省级卫生数据中心
Int J Popul Data Sci. 2019 Nov 20;4(2):1143. doi: 10.23889/ijpds.v4i2.1143.
5
Evaluation of approximate comparison methods on Bloom filters for probabilistic linkage.用于概率链接的布隆过滤器上近似比较方法的评估。
Int J Popul Data Sci. 2019 May 23;4(1):1095. doi: 10.23889/ijpds.v4i1.1095.
6
Methodology for linking Ryan White HIV/AIDS Program Services Report (RSR) client level data over multiple years.链接多年 Ryan White HIV/AIDS 计划服务报告 (RSR) 客户端级数据的方法。
PLoS One. 2020 Aug 21;15(8):e0237635. doi: 10.1371/journal.pone.0237635. eCollection 2020.
7
A hybrid approach to record linkage using a combination of deterministic and probabilistic methodology.一种使用确定性和概率性方法相结合的混合记录链接方法。
J Am Med Inform Assoc. 2020 Apr 1;27(4):505-513. doi: 10.1093/jamia/ocz232.
8
Unlocking the potential of population-based cancer registries.挖掘基于人群的癌症登记处的潜力。
Cancer. 2019 Nov 1;125(21):3729-3737. doi: 10.1002/cncr.32355. Epub 2019 Aug 5.
9
Under-reporting of diagnosed tuberculosis to the national surveillance system in China: an inventory study in nine counties in 2015.中国国家监测系统中诊断结核病漏报情况:2015 年九个县的清查研究。
BMJ Open. 2019 Jan 28;9(1):e021529. doi: 10.1136/bmjopen-2018-021529.
10
Estimating parameters for probabilistic linkage of privacy-preserved datasets.估算隐私保护数据集概率关联的参数。
BMC Med Res Methodol. 2017 Jul 10;17(1):95. doi: 10.1186/s12874-017-0370-0.