Datavant, San Francisco, CA.
Northwestern University, Chicago, IL.
AMIA Annu Symp Proc. 2023 Apr 29;2022:692-699. eCollection 2022.
Accurate record linkage depends on the availability and quality of features such as first name and last name. Privacy preserving record linkage methods using tokenization is sensitive to perturbations in the patient features used as inputs. In this study we evaluated the impact of name transformations on the accuracy of patient matching using a large commercial dataset. We used a set of 68 million records representing 59 million unique individuals, and implemented and evaluated eight name transformation strategies, and generated precision, recall and F1 scores. Transforming names to include the most common nicknames resulted in a significant gain in recall while maintaining precision, and generated the highest F1 score compared with no name transformation (0.905 vs 0.807). Strategies tailored to transforming patient features can improve the precision and recall of patient matching, and make it possible to create high quality, linked datasets for research purposes.
准确的记录链接依赖于特征(如名字和姓氏)的可用性和质量。使用标记化的隐私保护记录链接方法对作为输入的患者特征的干扰很敏感。在这项研究中,我们使用大型商业数据集评估了名称转换对患者匹配准确性的影响。我们使用了一组代表 5900 万个唯一个体的 6800 万条记录,并实现和评估了八种名称转换策略,并生成了精度、召回率和 F1 分数。将姓名转换为包含最常见的昵称可以显著提高召回率,同时保持精度,并与不进行名称转换相比生成最高的 F1 分数(0.905 比 0.807)。针对转换患者特征的策略可以提高患者匹配的精度和召回率,并有可能为研究目的创建高质量的、链接的数据集。