Department of Family Medicine, Indiana University School of Medicine, Indianapolis, Indiana, USA.
Regenstrief Institute, Center for Biomedical Informatics, Indianapolis, Indiana, USA.
J Am Med Inform Assoc. 2022 Jul 12;29(8):1409-1415. doi: 10.1093/jamia/ocac068.
This study sought both to support evidence-based patient identity policy development by illustrating an approach for formally evaluating operational matching methods, and also to characterize the performance of both referential and probabilistic patient matching algorithms using real-world demographic data.
We assessed matching accuracy for referential and probabilistic matching algorithms using a manually reviewed 30 000 record gold standard reference dataset derived from a large health information exchange containing over 47 million patient registrations. We applied referential and probabilistic algorithms to this dataset and compared the outputs to the gold standard. We computed performance metrics including sensitivity (recall), positive predictive value (precision), and F-score for each algorithm.
The probabilistic algorithm exhibited sensitivity, positive predictive value (PPV), and F-score of .6366, 0.9995, and 0.7778, respectively. The referential algorithm exhibited corresponding sensitivity, PPV, and F-score values of 0.9351, 0.9996, and 0.9663, respectively. Treating discordant and limited-data records as nonmatches increased referential match sensitivity to 0.9578. Compared to the more traditional probabilistic approach, referential matching exhibits greater accuracy.
Referential patient matching, an increasingly popular method among health IT vendors, demonstrated notably greater accuracy than a more traditional probabilistic approach without the adaptation of the algorithm to the data that the traditional probabilistic approach usually requires. Health IT policymakers, including the Office of the National Coordinator for Health Information Technology (ONC), should explore strategies to expand the evidence base for real-world matching system performance, given the need for an evidence-based patient identity strategy.
本研究旨在通过展示一种正式评估操作匹配方法的方法,为基于证据的患者身份政策制定提供支持,同时使用真实世界的人口统计学数据来描述参考和概率患者匹配算法的性能。
我们使用从包含超过 4700 万患者注册的大型健康信息交换中提取的手动审查的 30000 条记录黄金标准参考数据集,评估了参考和概率匹配算法的匹配准确性。我们将参考和概率算法应用于该数据集,并将输出结果与黄金标准进行比较。我们为每个算法计算了性能指标,包括敏感性(召回率)、阳性预测值(精度)和 F 分数。
概率算法的敏感性、阳性预测值(PPV)和 F 分数分别为 0.6366、0.9995 和 0.7778。参考算法的相应敏感性、PPV 和 F 分数值分别为 0.9351、0.9996 和 0.9663。将不一致和数据有限的记录视为不匹配,可将参考匹配的敏感性提高到 0.9578。与更传统的概率方法相比,参考匹配的准确性更高。
参考患者匹配是一种越来越受健康信息技术供应商欢迎的方法,与传统的概率方法相比,它的准确性明显更高,而传统的概率方法通常需要对算法进行数据适应。鉴于需要基于证据的患者身份策略,健康信息技术政策制定者,包括国家卫生信息技术协调员办公室(ONC),应探索策略来扩大现实匹配系统性能的证据基础。