Suppr超能文献

一种用于盲目记录匹配的简单启发式方法。

A simple heuristic for blindfolded record linkage.

机构信息

Center for Clinical Informatics, Stanford University, Stanford, California 94305, USA.

出版信息

J Am Med Inform Assoc. 2012 Jun;19(e1):e157-61. doi: 10.1136/amiajnl-2011-000329. Epub 2012 Feb 1.

Abstract

OBJECTIVES

To address the challenge of balancing privacy with the need to create cross-site research registry records on individual patients, while matching the data for a given patient as he or she moves between participating sites. To evaluate the strategy of generating anonymous identifiers based on real identifiers in such a way that the chances of a shared patient being accurately identified were maximized, and the chances of incorrectly joining two records belonging to different people were minimized.

METHODS

Our hypothesis was that most variation in names occurs after the first two letters, and that date of birth is highly reliable, so a single match variable consisting of a hashed string built from the first two letters of the patient's first and last names plus their date of birth would have the desired characteristics. We compared and contrasted the match algorithm characteristics (rate of false positive v. rate of false negative) for our chosen variable against both Social Security Numbers and full names.

RESULTS

In a data set of 19 000 records, a derived match variable consisting of a 2-character prefix from both first and last names combined with date of birth has a 97% sensitivity; by contrast, an anonymized identifier based on the patient's full names and date of birth has a sensitivity of only 87% and SSN has sensitivity 86%.

CONCLUSION

The approach we describe is most useful in situations where privacy policies preclude the full exchange of the identifiers required by more sophisticated and sensitive linkage algorithms. For data sets of sufficiently high quality this effective approach, while producing a lower rate of matching than more complex algorithms, has the merit of being easy to explain to institutional review boards, adheres to the minimum necessary rule of the HIPAA privacy rule, and is faster and less cumbersome to implement than a full probabilistic linkage.

摘要

目的

解决在保护隐私的同时,为个体患者创建跨站点研究注册记录的挑战,同时匹配患者在参与站点之间移动时的数据。评估基于真实标识符生成匿名标识符的策略,以最大化共享患者被准确识别的机会,并最小化错误地将属于不同人的两个记录合并的机会。

方法

我们的假设是,名字的大多数变化发生在前两个字母之后,出生日期是高度可靠的,因此,由患者的名字的前两个字母加上他们的出生日期组成的哈希字符串构建的单一匹配变量将具有所需的特征。我们比较和对比了我们选择的变量与社会安全号码和全名的匹配算法特征(假阳性率与假阴性率)。

结果

在一个包含 19000 条记录的数据集中,由姓氏和名字的前两个字符加上出生日期组成的派生匹配变量的灵敏度为 97%;相比之下,基于患者全名和出生日期的匿名标识符的灵敏度仅为 87%,而社会安全号码的灵敏度为 86%。

结论

我们描述的方法在隐私政策排除更复杂和敏感的链接算法所需标识符的完全交换的情况下最有用。对于质量足够高的数据集,这种有效的方法虽然产生的匹配率低于更复杂的算法,但具有易于向机构审查委员会解释的优点,符合 HIPAA 隐私规则的最小必要规则,并且比完整的概率链接更快、更不繁琐。

相似文献

1
A simple heuristic for blindfolded record linkage.一种用于盲目记录匹配的简单启发式方法。
J Am Med Inform Assoc. 2012 Jun;19(e1):e157-61. doi: 10.1136/amiajnl-2011-000329. Epub 2012 Feb 1.
5
[Assessment of the discriminating power of identifiers for record linkage].[记录链接标识符鉴别能力评估]
Rev Epidemiol Sante Publique. 2004 Oct;52(5):431-40. doi: 10.1016/s0398-7620(04)99079-7.

引用本文的文献

6
Privacy preserving linkage using multiple match-keys.使用多个匹配键的隐私保护链接
Int J Popul Data Sci. 2019 May 23;4(1):1094. doi: 10.23889/ijpds.v4i1.1094.
9
Data Science for Child Health.儿童健康数据科学
J Pediatr. 2019 May;208:12-22. doi: 10.1016/j.jpeds.2018.12.041. Epub 2019 Jan 25.

本文引用的文献

3
Privacy-preserving record linkage using Bloom filters.使用布隆过滤器的隐私保护记录链接
BMC Med Inform Decis Mak. 2009 Aug 25;9:41. doi: 10.1186/1472-6947-9-41.
5
Which are the best identifiers for record linkage?记录链接的最佳标识符有哪些?
Med Inform Internet Med. 2004 Sep-Dec;29(3-4):221-7. doi: 10.1080/14639230400005974.
6
Some methods for blindfolded record linkage.一些用于盲态记录链接的方法。
BMC Med Inform Decis Mak. 2004 Jun 28;4:9. doi: 10.1186/1472-6947-4-9.
10
Against simple universal health-care identifiers.反对简单的通用医疗保健标识符。
J Am Med Inform Assoc. 1994 Jul-Aug;1(4):316-9. doi: 10.1136/jamia.1994.95236164.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验