使用诊断代码对存在差异的去标识研究数据集进行概率性记录链接。

Probabilistic record linkage of de-identified research datasets with discrepancies using diagnosis codes.

机构信息

Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA.

University Bordeaux, ISPED, Inserm Bordeaux Population Health Research Center, UMR 1219, Inria SISTM, Bordeaux F-33000, France.

出版信息

Sci Data. 2019 Jan 8;6:180298. doi: 10.1038/sdata.2018.298.

DOI:10.1038/sdata.2018.298

PMID:30620344

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6326114/

Abstract

We develop an algorithm for probabilistic linkage of de-identified research datasets at the patient level, when only diagnosis codes with discrepancies and no personal health identifiers such as name or date of birth are available. It relies on Bayesian modelling of binarized diagnosis codes, and provides a posterior probability of matching for each patient pair, while considering all the data at once. Both in our simulation study (using an administrative claims dataset for data generation) and in two real use-cases linking patient electronic health records from a large tertiary care network, our method exhibits good performance and compares favourably to the standard baseline Fellegi-Sunter algorithm. We propose a scalable, fast and efficient open-source implementation in the ludic R package available on CRAN, which also includes the anonymized diagnosis code data from our real use-case. This work suggests it is possible to link de-identified research databases stripped of any personal health identifiers using only diagnosis codes, provided sufficient information is shared between the data sources.

摘要

我们开发了一种在患者水平上对去识别研究数据集进行概率链接的算法，当只有有差异的诊断代码且没有个人健康标识符（如姓名或出生日期）可用时。它依赖于二进制诊断代码的贝叶斯建模，并为每个患者对提供匹配的后验概率，同时考虑到所有数据。无论是在我们的模拟研究（使用行政索赔数据集进行数据生成）还是在两个真实用例中，将大型三级保健网络的患者电子健康记录进行链接，我们的方法都表现出良好的性能，并优于标准的 Fellegi-Sunter 基线算法。我们在 ludic R 包中提出了一种可扩展、快速且高效的开源实现，该包可在 CRAN 上获得，其中还包括我们实际用例中的匿名诊断代码数据。这项工作表明，只要在数据源之间共享足够的信息，就有可能仅使用诊断代码来链接去识别研究数据库，而无需任何个人健康标识符。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/367f/6326114/02bf21acca9a/sdata2018298-f1.jpg

相似文献

Probabilistic record linkage of de-identified research datasets with discrepancies using diagnosis codes.使用诊断代码对存在差异的去标识研究数据集进行概率性记录链接。

Sci Data. 2019 Jan 8;6:180298. doi: 10.1038/sdata.2018.298.

De-identified Bayesian personal identity matching for privacy-preserving record linkage despite errors: development and validation.去标识化贝叶斯个人身份匹配用于隐私保护记录链接，即使存在错误：开发和验证。

BMC Med Inform Decis Mak. 2023 May 5;23(1):85. doi: 10.1186/s12911-023-02176-6.

A new computationally efficient algorithm for record linkage with field dependency and missing data imputation.一种新的具有字段依赖性和缺失数据插补功能的计算效率高的记录链接算法。

Int J Med Inform. 2018 Jan;109:70-75. doi: 10.1016/j.ijmedinf.2017.10.021. Epub 2017 Nov 6.

Utilising identifier error variation in linkage of large administrative data sources.利用大型行政数据源链接中的标识符错误变异。

BMC Med Res Methodol. 2017 Feb 7;17(1):23. doi: 10.1186/s12874-017-0306-8.

Extending the Fellegi-Sunter probabilistic record linkage method for approximate field comparators.扩展 Fellegi-Sunter 概率记录链接方法以用于近似字段比较器。

J Biomed Inform. 2010 Feb;43(1):24-30. doi: 10.1016/j.jbi.2009.08.004. Epub 2009 Aug 13.

The Data-Adaptive Fellegi-Sunter Model for Probabilistic Record Linkage: Algorithm Development and Validation for Incorporating Missing Data and Field Selection.数据自适应 Fellegi-Sunter 模型在概率记录链接中的应用：纳入缺失数据和字段选择的算法开发和验证。

J Med Internet Res. 2022 Sep 29;24(9):e33775. doi: 10.2196/33775.

A method for cohort selection of cardiovascular disease records from an electronic health record system.一种从电子健康记录系统中选择心血管疾病记录队列的方法。

Int J Med Inform. 2017 Jun;102:138-149. doi: 10.1016/j.ijmedinf.2017.03.015. Epub 2017 Mar 30.

Designing an Innovative Data Architecture for the Los Angeles Data Resource (LADR).为洛杉矶数据资源（LADR）设计创新的数据架构。

Stud Health Technol Inform. 2015;216:1055.

Creating longitudinal datasets and cleaning existing data identifiers in a cystic fibrosis registry using a novel Bayesian probabilistic approach from astronomy.利用天文学中的一种新颖的贝叶斯概率方法创建囊性纤维化登记处的纵向数据集并清理现有数据标识符。

PLoS One. 2018 Jul 9;13(7):e0199815. doi: 10.1371/journal.pone.0199815. eCollection 2018.

Patient Record Linkage for Data Quality Assessment Based on Time Series Matching.基于时间序列匹配的数据质量评估的患者记录链接

Stud Health Technol Inform. 2019;260:210-217.

引用本文的文献

Envisioning the Future of Personalized Medicine: Role and Realities of Digital Twins.展望个性化医学的未来：数字孪生的作用和现实。

J Med Internet Res. 2024 May 13;26:e50204. doi: 10.2196/50204.

[Re-identification potential of structured health data].[结构化健康数据的重新识别潜力]

Bundesgesundheitsblatt Gesundheitsforschung Gesundheitsschutz. 2024 Feb;67(2):164-170. doi: 10.1007/s00103-023-03820-2. Epub 2024 Jan 17.

Record linkage of population-based cohort data from minors with national register data: a scoping review and comparative legal analysis of four European countries.基于人群的未成年人队列数据与国家登记数据的记录链接：四个欧洲国家的范围审查和比较法律分析

Open Res Eur. 2021 Sep 27;1:58. doi: 10.12688/openreseurope.13689.2. eCollection 2021.

Administrative records-based criterion measures.基于管理记录的标准测量。

Mil Psychol. 2023 Jul-Aug;35(4):351-363. doi: 10.1080/08995605.2022.2063614. Epub 2022 May 31.

Strategies to Address Current Challenges in Real-World Evidence Generation in Japan.应对日本真实世界证据生成当前挑战的策略。

Drugs Real World Outcomes. 2023 Jun;10(2):167-176. doi: 10.1007/s40801-023-00371-5. Epub 2023 May 13.

A Novel Approach to Generate a Virtual Population of Human Coronary Arteries for Clinical Trials of Stent Design.一种为支架设计临床试验生成虚拟人类冠状动脉群体的新方法。

IEEE Open J Eng Med Biol. 2021 May 20;2:201-209. doi: 10.1109/OJEMB.2021.3082328. eCollection 2021.

Artificial intelligence in clinical and translational science: Successes, challenges and opportunities.人工智能在临床和转化科学中的应用：成功、挑战与机遇。

Clin Transl Sci. 2022 Feb;15(2):309-321. doi: 10.1111/cts.13175. Epub 2021 Oct 30.

ATLAS: an automated association test using probabilistically linked health records with application to genetic studies.ATLAS：一种使用概率链接健康记录进行自动关联测试的方法，应用于遗传研究。

J Am Med Inform Assoc. 2021 Nov 25;28(12):2582-2592. doi: 10.1093/jamia/ocab187.

Fundamental privacy rights in a pandemic state.大流行时期的基本隐私权

PLoS One. 2021 Jun 2;16(6):e0252169. doi: 10.1371/journal.pone.0252169. eCollection 2021.

Linkage of Hospital Records and Death Certificates by a Search Engine and Machine Learning.通过搜索引擎和机器学习实现医院记录与死亡证明的关联

JAMIA Open. 2021 Mar 1;4(1):ooab005. doi: 10.1093/jamiaopen/ooab005. eCollection 2021 Jan.

本文引用的文献

Anonymizing datasets with demographics and diagnosis codes in the presence of utility constraints.在存在效用约束的情况下，对包含人口统计学和诊断代码的数据集进行匿名化处理。

J Biomed Inform. 2017 Jan;65:76-96. doi: 10.1016/j.jbi.2016.11.001. Epub 2016 Nov 8.

Record linkage to correct under-ascertainment of cancers in HIV cohorts: The Sinikithemba HIV clinic linkage project.通过记录链接纠正艾滋病毒队列中癌症确诊不足的情况：辛基滕巴艾滋病毒诊所链接项目

Int J Cancer. 2016 Sep 15;139(6):1209-16. doi: 10.1002/ijc.30154. Epub 2016 May 18.

Poor record linkage sensitivity biased outcomes in a linked cohort analysis.在一项关联队列分析中，不良的记录链接敏感性使结果产生偏差。

J Clin Epidemiol. 2016 Jul;75:70-7. doi: 10.1016/j.jclinepi.2016.01.023. Epub 2016 Feb 2.

Probabilistic record linkage.概率性记录链接

Int J Epidemiol. 2016 Jun;45(3):954-64. doi: 10.1093/ije/dyv322. Epub 2015 Dec 20.

Using Electronic Health Records for Population Health Research: A Review of Methods and Applications.利用电子健康记录进行人群健康研究：方法与应用综述。

Annu Rev Public Health. 2016;37:61-81. doi: 10.1146/annurev-publhealth-032315-021353. Epub 2015 Dec 11.

Methods to Develop an Electronic Medical Record Phenotype Algorithm to Compare the Risk of Coronary Artery Disease across 3 Chronic Disease Cohorts.开发电子病历表型算法以比较3个慢性病队列中冠状动脉疾病风险的方法。

PLoS One. 2015 Aug 24;10(8):e0136651. doi: 10.1371/journal.pone.0136651. eCollection 2015.

Design and implementation of a privacy preserving electronic health record linkage tool in Chicago.芝加哥一种隐私保护电子健康记录链接工具的设计与实现

J Am Med Inform Assoc. 2015 Sep;22(5):1072-80. doi: 10.1093/jamia/ocv038. Epub 2015 Jun 23.

Privacy preserving probabilistic record linkage (P3RL): a novel method for linking existing health-related data and maintaining participant confidentiality.隐私保护概率性记录链接（P3RL）：一种链接现有健康相关数据并维护参与者隐私的新方法。

BMC Med Res Methodol. 2015 May 30;15:46. doi: 10.1186/s12874-015-0038-6.

Linked Records of Children with Traumatic Brain Injury. Probabilistic Linkage without Use of Protected Health Information.创伤性脑损伤儿童的关联记录。不使用受保护健康信息的概率性关联。

Methods Inf Med. 2015;54(4):328-37. doi: 10.3414/ME14-01-0093. Epub 2015 May 29.

Publishing data from electronic health records while preserving privacy: a survey of algorithms.在保护隐私的同时发布电子健康记录数据：算法综述

J Biomed Inform. 2014 Aug;50:4-19. doi: 10.1016/j.jbi.2014.06.002. Epub 2014 Jun 14.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

使用诊断代码对存在差异的去标识研究数据集进行概率性记录链接。

Probabilistic record linkage of de-identified research datasets with discrepancies using diagnosis codes.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献