• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

来自不同来源的患者记录的自动链接。

Automated linkage of patient records from disparate sources.

作者信息

Li Xiaochun, Xu Huiping, Shen Changyu, Grannis Shaun

机构信息

Indiana University School of Medicine, Indianapolis, USA.

出版信息

Stat Methods Med Res. 2018 Jan;27(1):172-184. doi: 10.1177/0962280215626180. Epub 2016 Jul 20.

DOI:10.1177/0962280215626180
PMID:28034172
Abstract

We introduce an automated method of record linkage that has two key features, automated selection of match field interactions to include in the model for estimation and automated threshold determination for classifying record pairs to matches or non-matches. We applied our method to two real-world examples. The first example demonstrated results consistent with our earlier work: When data quality is adequate and the match field discriminating power is high, matching algorithms exhibit similar performance. The second example demonstrated that our method yields a lower false positive rate and higher positive predictive value than the Fellegi-Sunter model in the face of low data quality. When compared to the Fellegi-Sunter model, simulation studies suggest that our method exhibits better overall performance as indicated by higher area under the curve, and less biased estimates for both the match prevalence rate and the m- and u-probabilities over a range of data scenarios, especially when the match prevalence is extreme. Computationally, our method is as efficient as the Fellegi-Sunter model. We recommend this method in situations that an unsupervised linking algorithm is needed.

摘要

我们介绍了一种自动记录链接方法,该方法有两个关键特性,即自动选择要纳入估计模型的匹配字段交互,以及自动确定用于将记录对分类为匹配或不匹配的阈值。我们将我们的方法应用于两个实际示例。第一个示例展示的结果与我们早期的工作一致:当数据质量足够且匹配字段的区分能力较高时,匹配算法表现出相似的性能。第二个示例表明,在数据质量较低的情况下,我们的方法比费勒吉 - 桑特模型产生更低的误报率和更高的阳性预测值。与费勒吉 - 桑特模型相比,模拟研究表明,我们的方法表现出更好的整体性能,如更高的曲线下面积所示,并且在一系列数据场景中,对于匹配流行率以及m概率和u概率的估计偏差更小,尤其是当匹配流行率极端时。在计算方面,我们的方法与费勒吉 - 桑特模型一样高效。我们建议在需要无监督链接算法的情况下使用此方法。

相似文献

1
Automated linkage of patient records from disparate sources.来自不同来源的患者记录的自动链接。
Stat Methods Med Res. 2018 Jan;27(1):172-184. doi: 10.1177/0962280215626180. Epub 2016 Jul 20.
2
The Data-Adaptive Fellegi-Sunter Model for Probabilistic Record Linkage: Algorithm Development and Validation for Incorporating Missing Data and Field Selection.数据自适应 Fellegi-Sunter 模型在概率记录链接中的应用:纳入缺失数据和字段选择的算法开发和验证。
J Med Internet Res. 2022 Sep 29;24(9):e33775. doi: 10.2196/33775.
3
Extending the Fellegi-Sunter probabilistic record linkage method for approximate field comparators.扩展 Fellegi-Sunter 概率记录链接方法以用于近似字段比较器。
J Biomed Inform. 2010 Feb;43(1):24-30. doi: 10.1016/j.jbi.2009.08.004. Epub 2009 Aug 13.
4
A simple two-step procedure using the Fellegi-Sunter model for frequency-based record linkage.一种使用费勒吉-桑特模型进行基于频率的记录链接的简单两步程序。
J Appl Stat. 2021 May 4;49(11):2789-2804. doi: 10.1080/02664763.2021.1922615. eCollection 2022.
5
Variable selection for latent class analysis in the presence of missing data with application to record linkage.存在缺失数据时的潜在类别分析的变量选择及其在记录链接中的应用。
Stat Methods Med Res. 2024 Jun;33(6):966-980. doi: 10.1177/09622802241242317. Epub 2024 Apr 9.
6
A new computationally efficient algorithm for record linkage with field dependency and missing data imputation.一种新的具有字段依赖性和缺失数据插补功能的计算效率高的记录链接算法。
Int J Med Inform. 2018 Jan;109:70-75. doi: 10.1016/j.ijmedinf.2017.10.021. Epub 2017 Nov 6.
7
Controlling false match rates in record linkage using extreme value theory.利用极值理论控制记录匹配中的错误匹配率。
J Biomed Inform. 2011 Aug;44(4):648-54. doi: 10.1016/j.jbi.2011.02.008. Epub 2011 Feb 23.
8
Probabilistic linkage of large public health data files.大型公共卫生数据文件的概率性关联
Stat Med. 1995;14(5-7):491-8. doi: 10.1002/sim.4780140510.
9
Evaluation of record linkage methods for iterative insertions.迭代插入的记录链接方法评估
Methods Inf Med. 2009;48(5):429-37. doi: 10.3414/ME9238. Epub 2009 Aug 20.
10
An empiric weight computation for record linkage using linearly combined fields' similarity scores.一种使用线性组合字段相似性分数进行记录链接的经验权重计算方法。
Annu Int Conf IEEE Eng Med Biol Soc. 2014;2014:1346-9. doi: 10.1109/EMBC.2014.6943848.

引用本文的文献

1
Synthetic data in health care: A narrative review.医疗保健中的合成数据:一篇叙述性综述。
PLOS Digit Health. 2023 Jan 6;2(1):e0000082. doi: 10.1371/journal.pdig.0000082. eCollection 2023 Jan.
2
Syphilis testing adherence among women with livebirth deliveries: Indianapolis 2014-2016.梅毒检测在活产分娩妇女中的依从性:印第安纳波利斯 2014-2016 年。
BMC Pregnancy Childbirth. 2021 Oct 30;21(1):739. doi: 10.1186/s12884-021-04211-8.