• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

真实世界中去标识记录链接标记的匹配性能。

Real-World Matching Performance of Deidentified Record-Linking Tokens.

机构信息

School of Biomedical Informatics, The University of Texas Health Science Center, Houston, Texas, United States.

Division of General Internal Medicine, Department of Internal Medicine, McGovern Medical School, The University of Texas Health Science Center at Houston, Houston, Texas, United States.

出版信息

Appl Clin Inform. 2022 Aug;13(4):865-873. doi: 10.1055/a-1910-4154. Epub 2022 Jul 27.

DOI:10.1055/a-1910-4154
PMID:35896508
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9474266/
Abstract

OBJECTIVE

Our objective was to evaluate tokens commonly used by clinical research consortia to aggregate clinical data across institutions.

METHODS

This study compares tokens alone and token-based matching algorithms against manual annotation for 20,002 record pairs extracted from the University of Texas Houston's clinical data warehouse (CDW) in terms of entity resolution.

RESULTS

The highest precision achieved was 99.9% with a token derived from the first name, last name, gender, and date-of-birth. The highest recall achieved was 95.5% with an algorithm involving tokens that reflected combinations of first name, last name, gender, date-of-birth, and social security number.

DISCUSSION

To protect the privacy of patient data, information must be removed from a health care dataset to obscure the identity of individuals from which that data were derived. However, once identifying information is removed, records can no longer be linked to the same entity to enable analyses. Tokens are a mechanism to convert patient identifying information into Health Insurance Portability and Accountability Act-compliant deidentified elements that can be used to link clinical records, while preserving patient privacy.

CONCLUSION

Depending on the availability and accuracy of the underlying data, tokens are able to resolve and link entities at a high level of precision and recall for real-world data derived from a CDW.

摘要

目的

我们的目的是评估临床研究联盟常用的标记来跨机构聚合临床数据。

方法

本研究比较了标记和基于标记的匹配算法与手动注释,以解决 20002 对记录对的实体解析问题,这些记录对是从德克萨斯大学休斯顿分校的临床数据仓库(CDW)中提取的。

结果

使用来自姓名、性别和出生日期的标记,可实现最高精度 99.9%。使用涉及姓名、性别、出生日期和社会安全号码组合的算法,可实现最高召回率 95.5%。

讨论

为了保护患者数据的隐私,必须从医疗保健数据集删除信息,以掩盖数据来源的个人身份。但是,一旦删除了识别信息,就无法再将记录链接到同一个实体以进行分析。标记是一种将患者识别信息转换为符合《健康保险携带和责任法案》的匿名化元素的机制,这些元素可用于链接临床记录,同时保护患者隐私。

结论

根据底层数据的可用性和准确性,标记可以在高精准度和高召回率的情况下解析和链接来自 CDW 的真实世界数据的实体。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e123/9474266/4d38ded0d407/10-1055-a-1910-4154-i202204ra0122-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e123/9474266/957396be42c9/10-1055-a-1910-4154-i202204ra0122-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e123/9474266/4d38ded0d407/10-1055-a-1910-4154-i202204ra0122-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e123/9474266/957396be42c9/10-1055-a-1910-4154-i202204ra0122-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e123/9474266/4d38ded0d407/10-1055-a-1910-4154-i202204ra0122-2.jpg

相似文献

1
Real-World Matching Performance of Deidentified Record-Linking Tokens.真实世界中去标识记录链接标记的匹配性能。
Appl Clin Inform. 2022 Aug;13(4):865-873. doi: 10.1055/a-1910-4154. Epub 2022 Jul 27.
2
Designing an algorithm to preserve privacy for medical record linkage with error-prone data.设计一种算法,在存在错误数据的情况下保护医疗记录链接的隐私。
JMIR Med Inform. 2014 Jan 20;2(1):e2. doi: 10.2196/medinform.3090.
3
Privacy-Preserving Record Linkage of Deidentified Records Within a Public Health Surveillance System: Evaluation Study.公共卫生监测系统中去识别化记录的隐私保护记录链接:评估研究
J Med Internet Res. 2020 Jun 24;22(6):e16757. doi: 10.2196/16757.
4
Privacy-preserving record linkage across disparate institutions and datasets to enable a learning health system: The national COVID cohort collaborative (N3C) experience.跨不同机构和数据集进行隐私保护记录链接以实现学习型健康系统:国家COVID队列协作(N3C)的经验。
Learn Health Syst. 2024 Jan 11;8(1):e10404. doi: 10.1002/lrh2.10404. eCollection 2024 Jan.
5
The pattern of name tokens in narrative clinical text and a comparison of five systems for redacting them.叙事临床文本中命名实体的模式与五种命名实体消歧系统的比较。
J Am Med Inform Assoc. 2014 May-Jun;21(3):423-31. doi: 10.1136/amiajnl-2013-001689. Epub 2013 Sep 11.
6
Complying with the Health Insurance Portability and Accountability Act. Privacy standards.遵守《健康保险流通与责任法案》。隐私标准。
AAOHN J. 2001 Nov;49(11):501-7.
7
Privacy-preserving matching of similar patients.相似患者的隐私保护匹配
J Biomed Inform. 2016 Feb;59:285-98. doi: 10.1016/j.jbi.2015.12.004. Epub 2015 Dec 17.
8
Nonspecific deidentification of date-like text in deidentified clinical notes enables reidentification of dates.去识别化的临床记录中类似日期的非特定信息的去识别化处理可使日期被重新识别。
J Am Med Inform Assoc. 2022 Oct 7;29(11):1967-1971. doi: 10.1093/jamia/ocac147.
9
Security of electronic medical information and patient privacy: what you need to know.电子医疗信息和患者隐私的安全:你需要知道的。
J Am Coll Radiol. 2014 Dec;11(12 Pt B):1212-6. doi: 10.1016/j.jacr.2014.09.011. Epub 2014 Dec 1.
10
A Privacy-Preserving Distributed Medical Data Integration Security System for Accuracy Assessment of Cancer Screening: Development Study of Novel Data Integration System.一种用于癌症筛查准确性评估的隐私保护分布式医学数据集成安全系统:新型数据集成系统的开发研究
JMIR Med Inform. 2022 Dec 30;10(12):e38922. doi: 10.2196/38922.

引用本文的文献

1
Enabling secure and self determined health data sharing and consent management.实现安全且自主的健康数据共享和同意管理。
NPJ Digit Med. 2025 Aug 30;8(1):560. doi: 10.1038/s41746-025-01945-z.
2
Real-world pharmacotherapy treatment patterns among patients diagnosed with postpartum depression in the United States.美国产后抑郁症患者的真实世界药物治疗模式。
BMC Psychiatry. 2025 Jun 4;25(1):572. doi: 10.1186/s12888-025-06977-z.
3
Association of Polygenic-Based Breast Cancer Risk Prediction With Patient Management.基于多基因的乳腺癌风险预测与患者管理的关联。

本文引用的文献

1
Privacy-Preserving Record Linkage of Deidentified Records Within a Public Health Surveillance System: Evaluation Study.公共卫生监测系统中去识别化记录的隐私保护记录链接:评估研究
J Med Internet Res. 2020 Jun 24;22(6):e16757. doi: 10.2196/16757.
2
Implementing a hash-based privacy-preserving record linkage tool in the OneFlorida clinical research network.在佛罗里达临床研究网络中实施基于哈希的隐私保护记录链接工具。
JAMIA Open. 2019 Sep 27;2(4):562-569. doi: 10.1093/jamiaopen/ooz050. eCollection 2019 Dec.
3
Evaluating privacy-preserving record linkage using cryptographic long-term keys and multibit trees on large medical datasets.
JCO Precis Oncol. 2025 May;9:e2400716. doi: 10.1200/PO-24-00716. Epub 2025 May 7.
4
Linkage of Clinical Trial Data to Routinely Collected Data Sources: A Scoping Review.临床试验数据与常规收集数据源的关联:一项范围综述
JAMA Netw Open. 2025 Apr 1;8(4):e257797. doi: 10.1001/jamanetworkopen.2025.7797.
5
Longitudinal Relationship Between Elevated Liver Biochemical Tests and Negative Clinical Outcomes in Primary Biliary Cholangitis: A Population-Based Study.原发性胆汁性胆管炎中肝脏生化检查指标升高与不良临床结局的纵向关系:一项基于人群的研究
Aliment Pharmacol Ther. 2025 Jun;61(11):1775-1784. doi: 10.1111/apt.70120. Epub 2025 Apr 2.
6
Screening Positive for Rare Autosomal Aneuploidies Increases Frequency of Adverse Pregnancy Outcomes and Alters Clinical Management.罕见常染色体非整倍体筛查呈阳性会增加不良妊娠结局的发生率并改变临床管理。
Prenat Diagn. 2025 Sep;45(10):1265-1276. doi: 10.1002/pd.6776. Epub 2025 Mar 23.
7
Burden of illness for patients with primary biliary cholangitis: an observational study of clinical characteristics and healthcare resource utilization.原发性胆汁性胆管炎患者的疾病负担:一项关于临床特征和医疗资源利用的观察性研究
J Comp Eff Res. 2025 Apr;14(4):e240174. doi: 10.57264/cer-2024-0174. Epub 2025 Mar 6.
8
Accuracy of privacy preserving record linkage for real world data in the United States: a systemic review.美国真实世界数据中隐私保护记录链接的准确性:一项系统评价。
JAMIA Open. 2025 Jan 22;8(1):ooaf002. doi: 10.1093/jamiaopen/ooaf002. eCollection 2025 Feb.
9
Hepatic real-world outcomes with obeticholic acid in primary biliary cholangitis (HEROES): A trial emulation study design.原发性胆汁性胆管炎中使用奥贝胆酸的肝脏真实世界结局(HEROES):一项试验模拟研究设计
Hepatology. 2025 Jun 1;81(6):1647-1659. doi: 10.1097/HEP.0000000000001174. Epub 2025 Jan 3.
10
Linking clinical trial participants to their U.S. real-world data through tokenization: A practical guide.通过令牌化将临床试验参与者与其美国真实世界数据相链接:实用指南。
Contemp Clin Trials Commun. 2024 Aug 17;41:101354. doi: 10.1016/j.conctc.2024.101354. eCollection 2024 Oct.
在大型医学数据集上使用加密长期密钥和多位树评估隐私保护记录链接。
BMC Med Inform Decis Mak. 2017 Jun 8;17(1):83. doi: 10.1186/s12911-017-0478-5.
4
Combining Different Privacy-Preserving Record Linkage Methods for Hospital Admission Data.结合不同的隐私保护记录链接方法用于医院入院数据
Stud Health Technol Inform. 2017;235:161-165.
5
A benchmark comparison of deterministic and probabilistic methods for defining manual review datasets in duplicate records reconciliation.在重复记录核对中定义人工审核数据集的确定性方法和概率性方法的基准比较。
J Am Med Inform Assoc. 2014 Jan-Feb;21(1):97-104. doi: 10.1136/amiajnl-2013-001744. Epub 2013 May 23.
6
Differential record linkage by Hispanic ethnicity and age in linked mortality studies: implications for the epidemiologic paradox.在关联死亡率研究中按西班牙裔种族和年龄进行差异记录链接:对流行病学悖论的影响。
J Aging Health. 2011 Dec;23(8):1263-84. doi: 10.1177/0898264311421369. Epub 2011 Sep 20.
7
Record linkage software in the public domain: a comparison of Link Plus, The Link King, and a 'basic' deterministic algorithm.公共领域的记录链接软件:Link Plus、The Link King与一种“基本”确定性算法的比较
Health Informatics J. 2008 Mar;14(1):5-15. doi: 10.1177/1460458208088855.
8
Paradox lost: explaining the Hispanic adult mortality advantage.悖论消失:解释西班牙裔成年人的死亡率优势。
Demography. 2004 Aug;41(3):385-415. doi: 10.1353/dem.2004.0024.