Kho Abel N, Cashy John P, Jackson Kathryn L, Pah Adam R, Goel Satyender, Boehnke Jörn, Humphries John Eric, Kominers Scott Duke, Hota Bala N, Sims Shannon A, Malin Bradley A, French Dustin D, Walunas Theresa L, Meltzer David O, Kaleba Erin O, Jones Roderick C, Galanter William L
Department of Medicine, and Center for Health Information Partnerships, Feinberg School of Medicine, Northwestern University, Chicago, IL, USA
Department of Medicine, and Center for Health Information Partnerships, Feinberg School of Medicine, Northwestern University, Chicago, IL, USA Department of Veterans Affairs, Pittsburgh PA.
J Am Med Inform Assoc. 2015 Sep;22(5):1072-80. doi: 10.1093/jamia/ocv038. Epub 2015 Jun 23.
To design and implement a tool that creates a secure, privacy preserving linkage of electronic health record (EHR) data across multiple sites in a large metropolitan area in the United States (Chicago, IL), for use in clinical research.
The authors developed and distributed a software application that performs standardized data cleaning, preprocessing, and hashing of patient identifiers to remove all protected health information. The application creates seeded hash code combinations of patient identifiers using a Health Insurance Portability and Accountability Act compliant SHA-512 algorithm that minimizes re-identification risk. The authors subsequently linked individual records using a central honest broker with an algorithm that assigns weights to hash combinations in order to generate high specificity matches.
The software application successfully linked and de-duplicated 7 million records across 6 institutions, resulting in a cohort of 5 million unique records. Using a manually reconciled set of 11 292 patients as a gold standard, the software achieved a sensitivity of 96% and a specificity of 100%, with a majority of the missed matches accounted for by patients with both a missing social security number and last name change. Using 3 disease examples, it is demonstrated that the software can reduce duplication of patient records across sites by as much as 28%.
Software that standardizes the assignment of a unique seeded hash identifier merged through an agreed upon third-party honest broker can enable large-scale secure linkage of EHR data for epidemiologic and public health research. The software algorithm can improve future epidemiologic research by providing more comprehensive data given that patients may make use of multiple healthcare systems.
设计并实施一种工具,用于在美国(伊利诺伊州芝加哥)一个大都市区的多个地点创建电子健康记录(EHR)数据的安全、隐私保护链接,以供临床研究使用。
作者开发并分发了一个软件应用程序,该程序对患者标识符进行标准化数据清理、预处理和哈希处理,以去除所有受保护的健康信息。该应用程序使用符合《健康保险流通与责任法案》的SHA - 512算法创建患者标识符的种子哈希码组合,将重新识别风险降至最低。作者随后使用中央诚实中介和一种算法链接个体记录,该算法为哈希组合分配权重以生成高特异性匹配。
该软件应用程序成功地在6个机构之间链接并去重了700万条记录,得到了500万条唯一记录的队列。以一组经过人工核对的11292名患者作为金标准,该软件的灵敏度为96%,特异性为100%,大多数未匹配的情况是由同时缺少社会保险号和姓氏发生变化的患者导致的。通过3个疾病实例表明,该软件可将各地点患者记录的重复率降低多达28%。
通过一个商定的第三方诚实中介对唯一的种子哈希标识符进行标准化分配的软件,能够实现用于流行病学和公共卫生研究的EHR数据的大规模安全链接。鉴于患者可能使用多个医疗系统,该软件算法可通过提供更全面的数据来改进未来的流行病学研究。