• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

探索真实世界健康数据记录链接的复杂性——一项连接癌症登记处和理赔数据的实例研究

Exploring the Complexity of Real-World Health Data Record Linkage-An Exemplary Study Linking Cancer Registry and Claims Data.

作者信息

Lendle Nadja, Kollhorst Bianca, Intemann Timm

机构信息

Department of Biometry and Data Management, Leibniz-Institute for Prevention Research and Epidemiology - BIPS, Bremen, Germany.

出版信息

Pharmacoepidemiol Drug Saf. 2025 Apr;34(4):e70120. doi: 10.1002/pds.70120.

DOI:10.1002/pds.70120
PMID:40130753
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11934838/
Abstract

PURPOSE

Record linkage based on quasi-identifiers remains an important approach as not every data source provides a comprehensive unique identifier. In this study, the reasons for the failure of a linkage based on quasi-identifiers were examined. Furthermore, informed algorithms using information on gold standard links were developed to investigate the potentially achievable linkage quality based on quasi-identifiers.

METHODS

The study population includes patients from an antidiabetic cohort from German claims and colorectal cancer patients from two German cancer registries. Linkage algorithms were applied using information on gold standard links. Informed linkage algorithms based on deterministic linkage, logistic regression, random forests, gradient boosting, and neural networks were derived and compared. Descriptive analyses were performed to identify reasons for the failure of linkage, such as discrepancies between data sources.

RESULTS

A gradient boosting-based linkage approach performed best, achieving a precision (positive predictive value) of 77%, a recall (sensitivity) of 81%, and an F*-measure (combining precision and recall) of 64%. Of 641 patients in GePaRD, 8% were not uniquely identifiable using birth year, sex, area of residence, and year and quarter of diagnosis, whereas 33% of 42 817 cancer registry patients were not uniquely identifiable with these quasi-identifiers.

CONCLUSIONS

Linkage of German claims and cancer registry data based on quasi-identifiers does result in insufficient linkage quality since subjects cannot be uniquely identified. It is advisable to use unique identifiers from a subsample, if available, to derive informed linkage algorithms for the entire sample. In this case, the machine learning technique gradient boosting has been found to outperform other methods.

摘要

目的

由于并非每个数据源都提供全面的唯一标识符,基于准标识符的记录链接仍然是一种重要的方法。在本研究中,我们考察了基于准标识符的链接失败的原因。此外,还开发了利用金标准链接信息的智能算法,以研究基于准标识符可能实现的链接质量。

方法

研究人群包括来自德国理赔数据库的抗糖尿病队列患者和来自两个德国癌症登记处的结直肠癌患者。使用金标准链接信息应用链接算法。推导并比较了基于确定性链接、逻辑回归、随机森林、梯度提升和神经网络的智能链接算法。进行描述性分析以确定链接失败的原因,例如数据源之间的差异。

结果

基于梯度提升的链接方法表现最佳,精确率(阳性预测值)达到77%,召回率(敏感度)达到81%,F*值(结合精确率和召回率)达到64%。在GePaRD的641名患者中,8%使用出生年份、性别、居住地区以及诊断年份和季度无法唯一识别,而在42817名癌症登记处患者中,33%使用这些准标识符无法唯一识别。

结论

基于准标识符对德国理赔数据和癌症登记数据进行链接,由于无法唯一识别个体,导致链接质量不足。如果有可用的子样本的唯一标识符,建议使用它来为整个样本推导智能链接算法。在这种情况下,已发现机器学习技术梯度提升优于其他方法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1be8/11934838/34a1cbd5f171/PDS-34-e70120-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1be8/11934838/d1d490a79ae5/PDS-34-e70120-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1be8/11934838/db34e20020c7/PDS-34-e70120-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1be8/11934838/34a1cbd5f171/PDS-34-e70120-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1be8/11934838/d1d490a79ae5/PDS-34-e70120-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1be8/11934838/db34e20020c7/PDS-34-e70120-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1be8/11934838/34a1cbd5f171/PDS-34-e70120-g003.jpg

相似文献

1
Exploring the Complexity of Real-World Health Data Record Linkage-An Exemplary Study Linking Cancer Registry and Claims Data.探索真实世界健康数据记录链接的复杂性——一项连接癌症登记处和理赔数据的实例研究
Pharmacoepidemiol Drug Saf. 2025 Apr;34(4):e70120. doi: 10.1002/pds.70120.
2
Record linkage of claims and cancer registries data-Evaluation of a deterministic linkage approach based on indirect personal identifiers.基于间接个人标识符的确定性链接方法的评估:索赔和癌症登记数据的链接。
Pharmacoepidemiol Drug Saf. 2022 Dec;31(12):1287-1293. doi: 10.1002/pds.5545. Epub 2022 Oct 6.
3
Linkage between Utah All Payers Claims Database and Central Cancer Registry.犹他州所有支付者索赔数据库与中央癌症登记处的联系。
Health Serv Res. 2019 Jun;54(3):707-713. doi: 10.1111/1475-6773.13114. Epub 2019 Jan 24.
4
Validity of deterministic record linkage using multiple indirect personal identifiers: linking a large registry to claims data.使用多个间接个人识别符进行确定性记录链接的有效性:将大型登记处与索赔数据相链接
Circ Cardiovasc Qual Outcomes. 2014 May;7(3):475-80. doi: 10.1161/CIRCOUTCOMES.113.000294. Epub 2014 Apr 22.
5
Individual mortality information in the German Pharmacoepidemiological Research Database (GePaRD): a validation study using a record linkage with a large cancer registry.德国药物流行病学研究数据库(GePaRD)中的个体死亡信息:一项使用与大型癌症登记处进行记录链接的验证研究。
BMJ Open. 2019 Jul 2;9(7):e028223. doi: 10.1136/bmjopen-2018-028223.
6
Implementation of an algorithm for the identification of breast cancer deaths in German health insurance claims data: a validation study based on a record linkage with administrative mortality data.德国医疗保险理赔数据中乳腺癌死亡识别算法的实施:基于与行政死亡数据记录链接的验证研究
BMJ Open. 2019 Jul 26;9(7):e026834. doi: 10.1136/bmjopen-2018-026834.
7
Validating mortality in the German Pharmacoepidemiological Research Database (GePaRD) against a mortality registry.对照死亡率登记处验证德国药物流行病学研究数据库(GePaRD)中的死亡率。
Pharmacoepidemiol Drug Saf. 2016 Jul;25(7):778-84. doi: 10.1002/pds.4005. Epub 2016 Apr 7.
8
Two approaches to linking census and hospital data.两种连接人口普查和医院数据的方法。
Health Rep. 2014 Oct;25(10):3-14.
9
[Linkage of claims data with data from epidemiological cancer registries: possibilities and limitations in the German federal states].[索赔数据与癌症流行病学登记数据的关联:德国联邦州的可能性与局限性]
Bundesgesundheitsblatt Gesundheitsforschung Gesundheitsschutz. 2022 May;65(5):615-623. doi: 10.1007/s00103-021-03475-x. Epub 2021 Dec 23.
10
Using machine learning to link electronic health records in cancer registries: On the tradeoff between linkage quality and manual effort.使用机器学习链接癌症登记处的电子健康记录:在链接质量和人工努力之间的权衡。
Int J Med Inform. 2024 May;185:105387. doi: 10.1016/j.ijmedinf.2024.105387. Epub 2024 Feb 28.

本文引用的文献

1
Use of isotretinoin among girls and women of childbearing age and occurrence of isotretinoin-exposed pregnancies in Germany: A population-based study.德国育龄期女孩和妇女中异维A酸的使用情况及异维A酸暴露妊娠的发生情况:一项基于人群的研究。
PLoS Med. 2024 Jan 25;21(1):e1004339. doi: 10.1371/journal.pmed.1004339. eCollection 2024 Jan.
2
Data linkage of two national databases: Lessons learned from linking the Dutch Arthroplasty Register with the Dutch Foundation for Pharmaceutical Statistics.两个国家数据库的数据链接:从荷兰关节置换登记处与荷兰制药统计基金会链接中获得的经验教训。
PLoS One. 2023 Mar 8;18(3):e0282519. doi: 10.1371/journal.pone.0282519. eCollection 2023.
3
Record linkage of claims and cancer registries data-Evaluation of a deterministic linkage approach based on indirect personal identifiers.
基于间接个人标识符的确定性链接方法的评估:索赔和癌症登记数据的链接。
Pharmacoepidemiol Drug Saf. 2022 Dec;31(12):1287-1293. doi: 10.1002/pds.5545. Epub 2022 Oct 6.
4
Opportunities and challenges when using record linkage of routinely collected electronic health care data to evaluate outcomes of systemic anti-cancer treatment in clinical practice.利用常规收集的电子医疗保健数据的记录链接来评估临床实践中全身抗癌治疗结果时的机遇与挑战。
Health Informatics J. 2022 Jan-Mar;28(1):14604582221077055. doi: 10.1177/14604582221077055.
5
F*: an interpretable transformation of the F-measure.F*:F 度量的一种可解释变换。
Mach Learn. 2021;110(3):451-456. doi: 10.1007/s10994-021-05964-1. Epub 2021 Mar 15.
6
Data from Population-based Cancer Registration for Secondary Data Analysis: Methodological Challenges and Perspectives.基于人群癌症登记的二次数据分析数据:方法学挑战与展望
Gesundheitswesen. 2020 Mar;82(S 01):S62-S71. doi: 10.1055/a-1009-6466. Epub 2019 Oct 29.
7
Reflections on modern methods: linkage error bias.关于现代方法的思考:连锁错误偏差。
Int J Epidemiol. 2019 Dec 1;48(6):2050-2060. doi: 10.1093/ije/dyz203.
8
Challenges in administrative data linkage for research.研究中行政数据链接的挑战。
Big Data Soc. 2017 Dec 5;4(2):2053951717745678. doi: 10.1177/2053951717745678.
9
Racial and Ethnic Differences in a Linkage with the National Death Index.种族和民族差异与国家死亡指数的关联。
Ethn Dis. 2017 Apr 20;27(2):77-84. doi: 10.18865/ed.27.2.77. eCollection 2017 Spring.
10
Linking Data for Mothers and Babies in De-Identified Electronic Health Data.在去识别化电子健康数据中关联母婴数据
PLoS One. 2016 Oct 20;11(10):e0164667. doi: 10.1371/journal.pone.0164667. eCollection 2016.