Suppr超能文献

通过自上而下的信息提取将电子健康记录用于构建队列研究的二次利用。

Secondary use of electronic health records for building cohort studies through top-down information extraction.

作者信息

Kreuzthaler Markus, Schulz Stefan, Berghold Andrea

机构信息

Institute for Medical Informatics, Statistics and Documentation, Medical University of Graz, Austria.

Institute for Medical Informatics, Statistics and Documentation, Medical University of Graz, Austria.

出版信息

J Biomed Inform. 2015 Feb;53:188-95. doi: 10.1016/j.jbi.2014.10.010. Epub 2014 Nov 21.

Abstract

Controlled clinical trials are usually supported with an in-front data aggregation system, which supports the storage of relevant information according to the trial context within a highly structured environment. In contrast to the documentation of clinical trials, daily routine documentation has many characteristics that influence data quality. One such characteristic is the use of non-standardized text, which is an indispensable part of information representation in clinical information systems. Based on a cohort study we highlight challenges for mining electronic health records targeting free text entry fields within semi-structured data sources. Our prototypical information extraction system achieved an F-measure of 0.91 (precision=0.90, recall=0.93) for the training set and an F-measure of 0.90 (precision=0.89, recall=0.92) for the test set. We analyze the obtained results in detail and highlight challenges and future directions for the secondary use of routine data in general.

摘要

对照临床试验通常由一个前端数据聚合系统提供支持,该系统支持在高度结构化的环境中根据试验背景存储相关信息。与临床试验文档不同,日常常规文档具有许多影响数据质量的特征。其中一个特征是使用非标准化文本,这是临床信息系统中信息表示不可或缺的一部分。基于一项队列研究,我们强调了在半结构化数据源中挖掘针对自由文本输入字段的电子健康记录所面临的挑战。我们的原型信息提取系统在训练集上的F值为0.91(精确率=0.90,召回率=0.93),在测试集上的F值为0.90(精确率=0.89,召回率=0.92)。我们详细分析了所得结果,并总体上强调了常规数据二次使用所面临的挑战和未来方向。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验