Suppr超能文献

ELII:一种用于快速时间查询的新型倒排索引,应用于大型新冠电子健康记录数据集。

ELII: A novel inverted index for fast temporal query, with application to a large Covid-19 EHR dataset.

机构信息

University of Texas Health Science Center at Houston, Houston, TX, USA.

University of Texas Health Science Center at Houston, Houston, TX, USA.

出版信息

J Biomed Inform. 2021 May;117:103744. doi: 10.1016/j.jbi.2021.103744. Epub 2021 Mar 26.

Abstract

Fast temporal query on large EHR-derived data sources presents an emerging big data challenge, as this query modality is intractable using conventional strategies that have not focused on addressing Covid-19-related research needs at scale. We introduce a novel approach called Event-level Inverted Index (ELII) to optimize time trade-offs between one-time batch preprocessing and subsequent open-ended, user-specified temporal queries. An experimental temporal query engine has been implemented in a NoSQL database using our new ELII strategy. Near-real-time performance was achieved on a large Covid-19 EHR dataset, with 1.3 million unique patients and 3.76 billion records. We evaluated the performance of ELII on several types of queries: classical (non-temporal), absolute temporal, and relative temporal. Our experimental results indicate that ELII accomplished these queries in seconds, achieving average speed accelerations of 26.8 times on relative temporal query, 88.6 times on absolute temporal query, and 1037.6 times on classical query compared to a baseline approach without using ELII. Our study suggests that ELII is a promising approach supporting fast temporal query, an important mode of cohort development for Covid-19 studies.

摘要

快速的时间查询对大型电子健康记录(EHR)衍生数据源提出了一个新兴的大数据挑战,因为这种查询模式使用传统策略是无法解决的,而传统策略并没有集中精力在大规模解决与 COVID-19 相关的研究需求。我们引入了一种称为事件级倒排索引(ELII)的新方法,以优化一次性批量预处理和随后的开放式、用户指定的时间查询之间的时间权衡。我们使用新的 ELII 策略在 NoSQL 数据库中实现了一个实验性的时间查询引擎。在一个大型 COVID-19 EHR 数据集上实现了接近实时的性能,其中有 130 万独特的患者和 376 亿条记录。我们评估了 ELII 在几种类型查询上的性能:经典(非时间)、绝对时间和相对时间。我们的实验结果表明,ELII 在几秒钟内完成了这些查询,与不使用 ELII 的基线方法相比,相对时间查询的平均加速达到 26.8 倍,绝对时间查询的平均加速达到 88.6 倍,经典查询的平均加速达到 1037.6 倍。我们的研究表明,ELII 是一种支持快速时间查询的有前途的方法,是 COVID-19 研究中队列开发的重要模式。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4a18/9759789/0ecdde940241/ga1_lrg.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验