• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

一种使用图形分析的高效记录链接方案,用于标识符错误检测。

An efficient record linkage scheme using graphical analysis for identifier error detection.

机构信息

NIHR Biomedical Research Centre, John Radcliffe Hospital, Oxford, UK.

出版信息

BMC Med Inform Decis Mak. 2011 Feb 1;11:7. doi: 10.1186/1472-6947-11-7.

DOI:10.1186/1472-6947-11-7
PMID:21284874
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3039555/
Abstract

BACKGROUND

Integration of information on individuals (record linkage) is a key problem in healthcare delivery, epidemiology, and "business intelligence" applications. It is now common to be required to link very large numbers of records, often containing various combinations of theoretically unique identifiers, such as NHS numbers, which are both incomplete and error-prone.

METHODS

We describe a two-step record linkage algorithm in which identifiers with high cardinality are identified or generated, and used to perform an initial exact match based linkage. Subsequently, the resulting clusters are studied and, if appropriate, partitioned using a graph based algorithm detecting erroneous identifiers.

RESULTS

The system was used to cluster over 250 million health records from five data sources within a large UK hospital group. Linkage, which was completed in about 30 minutes, yielded 3.6 million clusters of which about 99.8% contain, with high likelihood, records from one patient. Although computationally efficient, the algorithm's requirement for exact matching of at least one identifier of each record to another for cluster formation may be a limitation in some databases containing records of low identifier quality.

CONCLUSIONS

The technique described offers a simple, fast and highly efficient two-step method for large scale initial linkage for records commonly found in the UK's National Health Service.

摘要

背景

个体信息的整合(记录链接)是医疗保健、流行病学和“商业智能”应用中的一个关键问题。现在,通常需要链接大量的记录,这些记录通常包含各种理论上唯一的标识符组合,例如 NHS 号码,这些标识符既不完整又容易出错。

方法

我们描述了一种两步记录链接算法,其中标识具有高基数的标识符被识别或生成,并用于执行初始精确匹配的链接。随后,研究由此产生的集群,如果合适,使用基于图的算法检测错误标识符进行分区。

结果

该系统用于聚类来自英国一家大型医院集团的五个数据源的超过 2.5 亿条健康记录。链接在大约 30 分钟内完成,产生了 360 万个集群,其中约 99.8%包含来自一个患者的记录,可能性很高。尽管算法在计算上是高效的,但对于每个记录的至少一个标识符与另一个记录的精确匹配以形成集群的要求可能是某些包含标识符质量较低的记录的数据库的一个限制。

结论

所描述的技术提供了一种简单、快速和高效的两步方法,用于对英国国民保健制度中常见的记录进行大规模初始链接。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2126/3039555/76860a57e459/1472-6947-11-7-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2126/3039555/60a2cc0f2cc7/1472-6947-11-7-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2126/3039555/d4e035a39977/1472-6947-11-7-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2126/3039555/76860a57e459/1472-6947-11-7-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2126/3039555/60a2cc0f2cc7/1472-6947-11-7-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2126/3039555/d4e035a39977/1472-6947-11-7-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2126/3039555/76860a57e459/1472-6947-11-7-3.jpg

相似文献

1
An efficient record linkage scheme using graphical analysis for identifier error detection.一种使用图形分析的高效记录链接方案,用于标识符错误检测。
BMC Med Inform Decis Mak. 2011 Feb 1;11:7. doi: 10.1186/1472-6947-11-7.
2
Utilising identifier error variation in linkage of large administrative data sources.利用大型行政数据源链接中的标识符错误变异。
BMC Med Res Methodol. 2017 Feb 7;17(1):23. doi: 10.1186/s12874-017-0306-8.
3
Linking education and hospital data in England: linkage process and quality.链接英格兰的教育和医院数据:链接过程和质量。
Int J Popul Data Sci. 2021 Sep 16;6(1):1671. doi: 10.23889/ijpds.v6i1.1671. eCollection 2021.
4
Accuracy of Probabilistic Linkage Using the Enhanced Matching System for Public Health and Epidemiological Studies.使用公共卫生与流行病学研究增强匹配系统的概率性链接的准确性
PLoS One. 2015 Aug 24;10(8):e0136179. doi: 10.1371/journal.pone.0136179. eCollection 2015.
5
Evaluating bias due to data linkage error in electronic healthcare records.评估电子医疗记录中因数据链接错误导致的偏差。
BMC Med Res Methodol. 2014 Mar 5;14:36. doi: 10.1186/1471-2288-14-36.
6
Comparing record linkage software programs and algorithms using real-world data.使用真实世界的数据比较记录链接软件程序和算法。
PLoS One. 2019 Sep 24;14(9):e0221459. doi: 10.1371/journal.pone.0221459. eCollection 2019.
7
Approach to record linkage of primary care data from Clinical Practice Research Datalink to other health-related patient data: overview and implications.临床实践研究数据链接(CPRD)初级保健数据与其他健康相关患者数据的记录链接方法:概述及意义。
Eur J Epidemiol. 2019 Jan;34(1):91-99. doi: 10.1007/s10654-018-0442-4. Epub 2018 Sep 15.
8
Multiple valued logic approach for matching patient records in multiple databases.多值逻辑方法在多个数据库中匹配患者记录。
J Biomed Inform. 2012 Apr;45(2):224-30. doi: 10.1016/j.jbi.2011.10.009. Epub 2011 Nov 10.
9
FIRLA: a Fast Incremental Record Linkage Algorithm.FIRLA:一种快速增量记录链接算法。
J Biomed Inform. 2022 Jun;130:104094. doi: 10.1016/j.jbi.2022.104094. Epub 2022 May 10.
10
The SAIL databank: linking multiple health and social care datasets.SAIL数据库:连接多个健康与社会护理数据集。
BMC Med Inform Decis Mak. 2009 Jan 16;9:3. doi: 10.1186/1472-6947-9-3.

引用本文的文献

1
De-identified Bayesian personal identity matching for privacy-preserving record linkage despite errors: development and validation.去标识化贝叶斯个人身份匹配用于隐私保护记录链接,即使存在错误:开发和验证。
BMC Med Inform Decis Mak. 2023 May 5;23(1):85. doi: 10.1186/s12911-023-02176-6.
2
Mortality risks associated with empirical antibiotic activity in Escherichia coli bacteraemia: an analysis of electronic health records.与大肠杆菌菌血症中经验性抗生素活性相关的死亡率风险:电子健康记录分析。
J Antimicrob Chemother. 2022 Aug 25;77(9):2536-2545. doi: 10.1093/jac/dkac189.
3
Antimicrobial resistance in commensal opportunistic pathogens isolated from non-sterile sites can be an effective proxy for surveillance in bloodstream infections.

本文引用的文献

1
The SAIL databank: linking multiple health and social care datasets.SAIL数据库:连接多个健康与社会护理数据集。
BMC Med Inform Decis Mak. 2009 Jan 16;9:3. doi: 10.1186/1472-6947-9-3.
2
Hospital exposure in a UK population, and its association with bacteraemia.英国人群中的医院暴露及其与菌血症的关联。
J Hosp Infect. 2007 Dec;67(4):301-7. doi: 10.1016/j.jhin.2007.08.018. Epub 2007 Nov 19.
3
Mortality after Staphylococcus aureus bacteraemia in two hospitals in Oxfordshire, 1997-2003: cohort study.1997 - 2003年牛津郡两家医院金黄色葡萄球菌菌血症后的死亡率:队列研究
从非无菌部位分离的共生机会致病菌的耐药性可以作为血流感染监测的有效替代指标。
Sci Rep. 2021 Dec 3;11(1):23359. doi: 10.1038/s41598-021-02755-5.
4
Antimicrobial resistance determinants are associated with bacteraemia and adaptation to the healthcare environment: a bacterial genome-wide association study.抗微生物药物耐药性决定因素与菌血症和适应医疗保健环境有关:一项细菌全基因组关联研究。
Microb Genom. 2021 Nov;7(11). doi: 10.1099/mgen.0.000700.
5
Reconciling the Potentially Irreconcilable? Genotypic and Phenotypic Amoxicillin-Clavulanate Resistance in .调和潜在的不可调和之处?[具体研究对象]中阿莫西林-克拉维酸的基因型和表型耐药性
Antimicrob Agents Chemother. 2020 May 21;64(6). doi: 10.1128/AAC.02026-19.
6
'Caveat emptor': the cautionary tale of endocarditis and the potential pitfalls of clinical coding data-an electronic health records study.“买者自慎”:一则关于心内膜炎的警示故事,以及临床编码数据的潜在陷阱——一项电子健康记录研究。
BMC Med. 2019 Sep 4;17(1):169. doi: 10.1186/s12916-019-1390-x.
7
Stochastic modelling and inference in electronic hospital databases for the spread of infections: transmission in Oxfordshire hospitals 2007-2010.电子医院数据库中感染传播的随机建模与推断:2007 - 2010年牛津郡医院的传播情况
Ann Appl Stat. 2017;11(2):655-679. doi: 10.1214/16-aoas1011.
8
Using linked electronic health records to report healthcare-associated infections.利用电子健康记录进行医疗相关感染报告。
PLoS One. 2018 Nov 7;13(11):e0206860. doi: 10.1371/journal.pone.0206860. eCollection 2018.
9
The eICU Collaborative Research Database, a freely available multi-center database for critical care research.eICU 协作研究数据库,一个免费的多中心重症监护研究数据库。
Sci Data. 2018 Sep 11;5:180178. doi: 10.1038/sdata.2018.178.
10
Trends over time in Escherichia coli bloodstream infections, urinary tract infections, and antibiotic susceptibilities in Oxfordshire, UK, 1998-2016: a study of electronic health records.英国牛津郡 1998-2016 年时间趋势的大肠杆菌血流感染、尿路感染和抗生素敏感性:电子健康记录研究。
Lancet Infect Dis. 2018 Oct;18(10):1138-1149. doi: 10.1016/S1473-3099(18)30353-0. Epub 2018 Aug 17.
BMJ. 2006 Aug 5;333(7562):281. doi: 10.1136/bmj.38834.421713.2F. Epub 2006 Jun 23.
4
Medical record linkage in health information systems by approximate string matching and clustering.通过近似字符串匹配和聚类实现健康信息系统中的病历关联。
BMC Med Inform Decis Mak. 2005 Oct 11;5:32. doi: 10.1186/1472-6947-5-32.
5
MRSA bacteraemia in patients on arrival in hospital: a cohort study in Oxfordshire 1997-2003.患者入院时的耐甲氧西林金黄色葡萄球菌菌血症:1997 - 2003年牛津郡的一项队列研究
BMJ. 2005 Oct 29;331(7523):992. doi: 10.1136/bmj.38558.453310.8F. Epub 2005 Sep 9.