Information Technology, Harvard Medical School, Boston, Massachusetts 02115, USA.
J Am Med Inform Assoc. 2013 Jun;20(e1):e155-61. doi: 10.1136/amiajnl-2012-001299. Epub 2013 Jan 24.
In 2008 we developed a shared health research information network (SHRINE), which for the first time enabled research queries across the full patient populations of four Boston hospitals. It uses a federated architecture, where each hospital returns only the aggregate count of the number of patients who match a query. This allows hospitals to retain control over their local databases and comply with federal and state privacy laws. However, because patients may receive care from multiple hospitals, the result of a federated query might differ from what the result would be if the query were run against a single central repository. This paper describes the situations when this happens and presents a technique for correcting these errors.
We use a one-time process of identifying which patients have data in multiple repositories by comparing one-way hash values of patient demographics. This enables us to partition the local databases such that all patients within a given partition have data at the same subset of hospitals. Federated queries are then run separately on each partition independently, and the combined results are presented to the user.
Using theoretical bounds and simulated hospital networks, we demonstrate that once the partitions are made, SHRINE can produce more precise estimates of the number of patients matching a query.
Uncertainty in the overlap of patient populations across hospitals limits the effectiveness of SHRINE and other federated query tools. Our technique reduces this uncertainty while retaining an aggregate federated architecture.
2008 年,我们开发了一个共享健康研究信息网络(SHRINE),这是首次使研究查询能够跨越四个波士顿医院的全部患者群体。它使用联邦架构,每个医院仅返回与查询匹配的患者数量的汇总计数。这允许医院保留对其本地数据库的控制,并遵守联邦和州的隐私法。然而,由于患者可能在多个医院接受治疗,联邦查询的结果可能与针对单个中央存储库运行查询的结果不同。本文描述了这种情况发生的情况,并提出了一种纠正这些错误的技术。
我们通过比较患者人口统计学的单向哈希值来一次性识别具有多个存储库数据的患者。这使我们能够对本地数据库进行分区,以便给定分区内的所有患者在同一组医院都有数据。然后,我们分别在每个分区上独立运行联邦查询,并将组合结果呈现给用户。
使用理论界限和模拟医院网络,我们证明一旦进行分区,SHRINE 就可以更准确地估计匹配查询的患者数量。
医院之间患者群体的重叠不确定性限制了 SHRINE 和其他联邦查询工具的有效性。我们的技术在保留聚合联邦架构的同时降低了这种不确定性。