School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, Texas, USA.
J Am Med Inform Assoc. 2023 May 19;30(6):1091-1102. doi: 10.1093/jamia/ocad050.
We propose a system, quEHRy, to retrieve precise, interpretable answers to natural language questions from structured data in electronic health records (EHRs).
We develop/synthesize the main components of quEHRy: concept normalization (MetaMap), time frame classification (new), semantic parsing (existing), visualization with question understanding (new), and query module for FHIR mapping/processing (new). We evaluate quEHRy on 2 clinical question answering (QA) datasets. We evaluate each component separately as well as holistically to gain deeper insights. We also conduct a thorough error analysis for a crucial subcomponent, medical concept normalization.
Using gold concepts, the precision of quEHRy is 98.33% and 90.91% for the 2 datasets, while the overall accuracy was 97.41% and 87.75%. Precision was 94.03% and 87.79% even after employing an automated medical concept extraction system (MetaMap). Most incorrectly predicted medical concepts were broader in nature than gold-annotated concepts (representative of the ones present in EHRs), eg, Diabetes versus Diabetes Mellitus, Non-Insulin-Dependent.
The primary performance barrier to deployment of the system is due to errors in medical concept extraction (a component not studied in this article), which affects the downstream generation of correct logical structures. This indicates the need to build QA-specific clinical concept normalizers that understand EHR context to extract the "relevant" medical concepts from questions.
We present an end-to-end QA system that allows information access from EHRs using natural language and returns an exact, verifiable answer. Our proposed system is high-precision and interpretable, checking off the requirements for clinical use.
我们提出了一个系统 quEHRy,用于从电子健康记录(EHR)中的结构化数据中检索到对自然语言问题的准确、可解释的答案。
我们开发/合成了 quEHRy 的主要组件:概念规范化(MetaMap)、时间框架分类(新)、语义解析(现有)、带有问题理解的可视化(新),以及用于 FHIR 映射/处理的查询模块(新)。我们在 2 个临床问答(QA)数据集上评估了 quEHRy。我们分别评估每个组件以及整体,以获得更深入的见解。我们还对一个关键的子组件,医学概念规范化,进行了彻底的错误分析。
使用黄金概念,quEHRy 的精度在这两个数据集上分别为 98.33%和 90.91%,而整体准确率分别为 97.41%和 87.75%。即使使用了自动化的医学概念提取系统(MetaMap),精度仍然为 94.03%和 87.79%。大多数错误预测的医学概念比黄金标注概念更广泛(代表 EHR 中的概念),例如,Diabetes 与 Diabetes Mellitus、Non-Insulin-Dependent。
系统部署的主要性能障碍是由于医学概念提取(本文未研究的组件)的错误,这会影响下游正确逻辑结构的生成。这表明需要构建专门用于 QA 的临床概念规范化器,这些规范化器需要理解 EHR 上下文,从问题中提取“相关”的医学概念。
我们提出了一个端到端的 QA 系统,允许使用自然语言从 EHR 中访问信息,并返回一个准确、可验证的答案。我们提出的系统具有高精度和可解释性,满足了临床使用的要求。