ORBDA：一个用于电子健康记录服务器性能评估的openEHR基准数据集。

ORBDA: An openEHR benchmark dataset for performance assessment of electronic health record servers.

作者信息

Teodoro Douglas, Sundvall Erik, João Junior Mario, Ruch Patrick, Miranda Freire Sergio

机构信息

Departamento de Tecnologia da Informação e Educação em Saúde, Universidade do Estado do Rio de Janeiro, Rio de Janeiro, Brazil.

SIB Text Mining, Swiss Institute of Bioinformatics, Geneva, Switzerland.

出版信息

PLoS One. 2018 Jan 2;13(1):e0190028. doi: 10.1371/journal.pone.0190028. eCollection 2018.

DOI:10.1371/journal.pone.0190028

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5749730/

Abstract

The openEHR specifications are designed to support implementation of flexible and interoperable Electronic Health Record (EHR) systems. Despite the increasing number of solutions based on the openEHR specifications, it is difficult to find publicly available healthcare datasets in the openEHR format that can be used to test, compare and validate different data persistence mechanisms for openEHR. To foster research on openEHR servers, we present the openEHR Benchmark Dataset, ORBDA, a very large healthcare benchmark dataset encoded using the openEHR formalism. To construct ORBDA, we extracted and cleaned a de-identified dataset from the Brazilian National Healthcare System (SUS) containing hospitalisation and high complexity procedures information and formalised it using a set of openEHR archetypes and templates. Then, we implemented a tool to enrich the raw relational data and convert it into the openEHR model using the openEHR Java reference model library. The ORBDA dataset is available in composition, versioned composition and EHR openEHR representations in XML and JSON formats. In total, the dataset contains more than 150 million composition records. We describe the dataset and provide means to access it. Additionally, we demonstrate the usage of ORBDA for evaluating inserting throughput and query latency performances of some NoSQL database management systems. We believe that ORBDA is a valuable asset for assessing storage models for openEHR-based information systems during the software engineering process. It may also be a suitable component in future standardised benchmarking of available openEHR storage platforms.

摘要

openEHR规范旨在支持灵活且可互操作的电子健康记录（EHR）系统的实施。尽管基于openEHR规范的解决方案数量不断增加，但很难找到以openEHR格式公开可用的医疗数据集，用于测试、比较和验证openEHR的不同数据持久化机制。为了促进对openEHR服务器的研究，我们展示了openEHR基准数据集ORBDA，这是一个使用openEHR形式主义编码的非常大的医疗基准数据集。为了构建ORBDA，我们从巴西国家医疗系统（SUS）中提取并清理了一个去标识化的数据集，其中包含住院和高复杂性程序信息，并使用一组openEHR原型和模板将其形式化。然后，我们实现了一个工具来丰富原始关系数据，并使用openEHR Java参考模型库将其转换为openEHR模型。ORBDA数据集以XML和JSON格式提供组合、版本化组合和EHR openEHR表示形式。该数据集总共包含超过1.5亿条组合记录。我们描述了该数据集并提供了访问它的方法。此外，我们展示了ORBDA在评估一些NoSQL数据库管理系统的插入吞吐量和查询延迟性能方面的用法。我们相信，ORBDA是软件工程过程中评估基于openEHR的信息系统存储模型的宝贵资产。它也可能是未来可用openEHR存储平台标准化基准测试中的合适组件。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2ebd/5749730/31450055a5f9/pone.0190028.g001.jpg

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验