Indiana University School of Medicine, Indianapolis, Indiana, United States.
Department of Biostatistics, Indiana University School of Medicine, Indianapolis, Indiana, United States.
Appl Clin Inform. 2024 May;15(3):620-628. doi: 10.1055/a-2291-1391. Epub 2024 Mar 20.
Patient data are fragmented across multiple repositories, yielding suboptimal and costly care. Record linkage algorithms are widely accepted solutions for improving completeness of patient records. However, studies often fail to fully describe their linkage techniques. Further, while many frameworks evaluate record linkage methods, few focus on producing gold standard datasets. This highlights a need to assess these frameworks and their real-world performance. We use real-world datasets and expand upon previous frameworks to evaluate a consistent approach to the manual review of gold standard datasets and measure its impact on algorithm performance.
We applied the framework, which includes elements for data description, reviewer training and adjudication, and software and reviewer descriptions, to four datasets. Record pairs were formed and between 15,000 and 16,500 records were randomly sampled from these pairs. After training, two reviewers determined match status for each record pair. If reviewers disagreed, a third reviewer was used for final adjudication.
Between the four datasets, the percent discordant rate ranged from 1.8 to 13.6%. While reviewers' discordance rate typically ranged between 1 and 5%, one exhibited a 59% discordance rate, showing the importance of the third reviewer. The original analysis was compared with three sensitivity analyses. The original analysis most often exhibited the highest predictive values compared with the sensitivity analyses.
Reviewers vary in their assessment of a gold standard, which can lead to variances in estimates for matching performance. Our analysis demonstrates how a multireviewer process can be applied to create gold standards, identify reviewer discrepancies, and evaluate algorithm performance.
患者数据分散在多个存储库中,导致治疗效果不佳且成本高昂。记录链接算法是提高患者记录完整性的广泛接受的解决方案。然而,研究往往未能充分描述其链接技术。此外,虽然许多框架评估记录链接方法,但很少有框架专注于生成黄金标准数据集。这凸显了评估这些框架及其实际性能的必要性。我们使用真实数据集并扩展以前的框架,以评估一种一致的方法来手动审查黄金标准数据集,并衡量其对算法性能的影响。
我们将框架应用于四个数据集,该框架包括用于数据描述、评审员培训和裁决以及软件和评审员描述的元素。从这些对中形成记录对,并随机抽取 15000 到 16500 条记录。培训后,两名评审员为每一对记录确定匹配状态。如果评审员意见不一致,则使用第三名评审员进行最终裁决。
在这四个数据集之间,不一致率的百分比范围为 1.8%到 13.6%。虽然评审员的不一致率通常在 1%到 5%之间,但有一位评审员的不一致率达到 59%,这表明第三位评审员的重要性。原始分析与三种敏感性分析进行了比较。与敏感性分析相比,原始分析通常表现出最高的预测值。
评审员对黄金标准的评估存在差异,这可能导致匹配性能的估计值存在差异。我们的分析演示了如何应用多评审员流程来创建黄金标准、识别评审员差异以及评估算法性能。