Suppr超能文献

通过自上而下的信息提取将电子健康记录用于构建队列研究的二次利用。

Secondary use of electronic health records for building cohort studies through top-down information extraction.

作者信息

Kreuzthaler Markus, Schulz Stefan, Berghold Andrea

机构信息

Institute for Medical Informatics, Statistics and Documentation, Medical University of Graz, Austria.

Institute for Medical Informatics, Statistics and Documentation, Medical University of Graz, Austria.

出版信息

J Biomed Inform. 2015 Feb;53:188-95. doi: 10.1016/j.jbi.2014.10.010. Epub 2014 Nov 21.

Abstract

Controlled clinical trials are usually supported with an in-front data aggregation system, which supports the storage of relevant information according to the trial context within a highly structured environment. In contrast to the documentation of clinical trials, daily routine documentation has many characteristics that influence data quality. One such characteristic is the use of non-standardized text, which is an indispensable part of information representation in clinical information systems. Based on a cohort study we highlight challenges for mining electronic health records targeting free text entry fields within semi-structured data sources. Our prototypical information extraction system achieved an F-measure of 0.91 (precision=0.90, recall=0.93) for the training set and an F-measure of 0.90 (precision=0.89, recall=0.92) for the test set. We analyze the obtained results in detail and highlight challenges and future directions for the secondary use of routine data in general.

摘要

对照临床试验通常由一个前端数据聚合系统提供支持,该系统支持在高度结构化的环境中根据试验背景存储相关信息。与临床试验文档不同,日常常规文档具有许多影响数据质量的特征。其中一个特征是使用非标准化文本,这是临床信息系统中信息表示不可或缺的一部分。基于一项队列研究,我们强调了在半结构化数据源中挖掘针对自由文本输入字段的电子健康记录所面临的挑战。我们的原型信息提取系统在训练集上的F值为0.91(精确率=0.90,召回率=0.93),在测试集上的F值为0.90(精确率=0.89,召回率=0.92)。我们详细分析了所得结果,并总体上强调了常规数据二次使用所面临的挑战和未来方向。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验