University of Texas Health Science Center at Houston, Houston, TX, USA.
University of Texas Health Science Center at Houston, Houston, TX, USA.
J Biomed Inform. 2021 May;117:103744. doi: 10.1016/j.jbi.2021.103744. Epub 2021 Mar 26.
Fast temporal query on large EHR-derived data sources presents an emerging big data challenge, as this query modality is intractable using conventional strategies that have not focused on addressing Covid-19-related research needs at scale. We introduce a novel approach called Event-level Inverted Index (ELII) to optimize time trade-offs between one-time batch preprocessing and subsequent open-ended, user-specified temporal queries. An experimental temporal query engine has been implemented in a NoSQL database using our new ELII strategy. Near-real-time performance was achieved on a large Covid-19 EHR dataset, with 1.3 million unique patients and 3.76 billion records. We evaluated the performance of ELII on several types of queries: classical (non-temporal), absolute temporal, and relative temporal. Our experimental results indicate that ELII accomplished these queries in seconds, achieving average speed accelerations of 26.8 times on relative temporal query, 88.6 times on absolute temporal query, and 1037.6 times on classical query compared to a baseline approach without using ELII. Our study suggests that ELII is a promising approach supporting fast temporal query, an important mode of cohort development for Covid-19 studies.
快速的时间查询对大型电子健康记录(EHR)衍生数据源提出了一个新兴的大数据挑战,因为这种查询模式使用传统策略是无法解决的,而传统策略并没有集中精力在大规模解决与 COVID-19 相关的研究需求。我们引入了一种称为事件级倒排索引(ELII)的新方法,以优化一次性批量预处理和随后的开放式、用户指定的时间查询之间的时间权衡。我们使用新的 ELII 策略在 NoSQL 数据库中实现了一个实验性的时间查询引擎。在一个大型 COVID-19 EHR 数据集上实现了接近实时的性能,其中有 130 万独特的患者和 376 亿条记录。我们评估了 ELII 在几种类型查询上的性能:经典(非时间)、绝对时间和相对时间。我们的实验结果表明,ELII 在几秒钟内完成了这些查询,与不使用 ELII 的基线方法相比,相对时间查询的平均加速达到 26.8 倍,绝对时间查询的平均加速达到 88.6 倍,经典查询的平均加速达到 1037.6 倍。我们的研究表明,ELII 是一种支持快速时间查询的有前途的方法,是 COVID-19 研究中队列开发的重要模式。