Departamento de Informática y Sistemas, Universidad de Murcia, Murcia, Spain.
J Am Med Inform Assoc. 2013 Dec;20(e2):e288-96. doi: 10.1136/amiajnl-2013-001923. Epub 2013 Aug 9.
The secondary use of electronic healthcare records (EHRs) often requires the identification of patient cohorts. In this context, an important problem is the heterogeneity of clinical data sources, which can be overcome with the combined use of standardized information models, virtual health records, and semantic technologies, since each of them contributes to solving aspects related to the semantic interoperability of EHR data.
To develop methods allowing for a direct use of EHR data for the identification of patient cohorts leveraging current EHR standards and semantic web technologies.
We propose to take advantage of the best features of working with EHR standards and ontologies. Our proposal is based on our previous results and experience working with both technological infrastructures. Our main principle is to perform each activity at the abstraction level with the most appropriate technology available. This means that part of the processing will be performed using archetypes (ie, data level) and the rest using ontologies (ie, knowledge level). Our approach will start working with EHR data in proprietary format, which will be first normalized and elaborated using EHR standards and then transformed into a semantic representation, which will be exploited by automated reasoning.
We have applied our approach to protocols for colorectal cancer screening. The results comprise the archetypes, ontologies, and datasets developed for the standardization and semantic analysis of EHR data. Anonymized real data have been used and the patients have been successfully classified by the risk of developing colorectal cancer.
This work provides new insights in how archetypes and ontologies can be effectively combined for EHR-driven phenotyping. The methodological approach can be applied to other problems provided that suitable archetypes, ontologies, and classification rules can be designed.
电子健康记录(EHR)的二次使用通常需要识别患者队列。在这种情况下,一个重要的问题是临床数据源的异质性,可以通过标准化信息模型、虚拟健康记录和语义技术的联合使用来克服,因为它们各自有助于解决与 EHR 数据的语义互操作性相关的方面。
开发允许直接使用 EHR 数据来识别患者队列的方法,利用当前的 EHR 标准和语义网技术。
我们建议利用处理 EHR 标准和本体的最佳功能。我们的建议基于我们之前的研究结果和在这两种技术基础架构上的工作经验。我们的主要原则是在具有可用的最合适技术的抽象级别上执行每个活动。这意味着部分处理将使用原型(即数据级别)完成,其余部分将使用本体(即知识级别)完成。我们的方法将从使用专有的 EHR 数据开始,这些数据将首先使用 EHR 标准进行规范化和细化,然后转换为语义表示形式,由自动推理来利用。
我们已经将我们的方法应用于结直肠癌筛查的协议。结果包括为 EHR 数据的标准化和语义分析开发的原型、本体和数据集。已经使用了匿名的真实数据,并成功地根据结直肠癌的发病风险对患者进行了分类。
这项工作提供了关于如何有效地结合原型和本体用于 EHR 驱动的表型分析的新见解。只要能够设计出合适的原型、本体和分类规则,这种方法就可以应用于其他问题。