Department of Primary Care and Public Health Sciences, King's College London, London, UK.
J Am Med Inform Assoc. 2013 Dec;20(e2):e327-33. doi: 10.1136/amiajnl-2013-001858. Epub 2013 Sep 24.
We define and validate an architecture for systems that identify patient cohorts for clinical trials from multiple heterogeneous data sources. This architecture has an explicit query model capable of supporting temporal reasoning and expressing eligibility criteria independently of the representation of the data used to evaluate them.
The architecture has the key feature that queries defined according to the query model are both pre and post-processed and this is used to address both structural and semantic heterogeneity. The process of extracting the relevant clinical facts is separated from the process of reasoning about them. A specific instance of the query model is then defined and implemented.
We show that the specific instance of the query model has wide applicability. We then describe how it is used to access three diverse data warehouses to determine patient counts.
Although the proposed architecture requires greater effort to implement the query model than would be the case for using just SQL and accessing a data-based management system directly, this effort is justified because it supports both temporal reasoning and heterogeneous data sources. The query model only needs to be implemented once no matter how many data sources are accessed. Each additional source requires only the implementation of a lightweight adaptor.
The architecture has been used to implement a specific query model that can express complex eligibility criteria and access three diverse data warehouses thus demonstrating the feasibility of this approach in dealing with temporal reasoning and data heterogeneity.
我们定义并验证了一种从多个异构数据源中为临床试验确定患者队列的系统架构。该架构具有显式查询模型,能够支持时间推理,并独立于用于评估其的数据源表示来表达资格标准。
该架构的关键特征是,根据查询模型定义的查询既可以进行预处理,也可以进行后处理,这用于解决结构和语义异构性问题。然后定义并实现查询模型的特定实例。
我们表明,查询模型的特定实例具有广泛的适用性。然后,我们描述了如何使用它访问三个不同的数据仓库来确定患者计数。
尽管与直接使用 SQL 并访问基于数据的管理系统相比,所提出的架构在实现查询模型方面需要更多的努力,但这是合理的,因为它支持时间推理和异构数据源。无论访问多少个数据源,只需实现一次查询模型。每个附加源仅需要实现轻量级适配器。
该架构已用于实现特定的查询模型,该模型可以表达复杂的资格标准,并访问三个不同的数据仓库,从而证明了该方法在处理时间推理和数据异构性方面的可行性。