Kreuzthaler Markus, Schulz Stefan, Berghold Andrea
Institute for Medical Informatics, Statistics and Documentation, Medical University of Graz, Austria.
Institute for Medical Informatics, Statistics and Documentation, Medical University of Graz, Austria.
J Biomed Inform. 2015 Feb;53:188-95. doi: 10.1016/j.jbi.2014.10.010. Epub 2014 Nov 21.
Controlled clinical trials are usually supported with an in-front data aggregation system, which supports the storage of relevant information according to the trial context within a highly structured environment. In contrast to the documentation of clinical trials, daily routine documentation has many characteristics that influence data quality. One such characteristic is the use of non-standardized text, which is an indispensable part of information representation in clinical information systems. Based on a cohort study we highlight challenges for mining electronic health records targeting free text entry fields within semi-structured data sources. Our prototypical information extraction system achieved an F-measure of 0.91 (precision=0.90, recall=0.93) for the training set and an F-measure of 0.90 (precision=0.89, recall=0.92) for the test set. We analyze the obtained results in detail and highlight challenges and future directions for the secondary use of routine data in general.
对照临床试验通常由一个前端数据聚合系统提供支持,该系统支持在高度结构化的环境中根据试验背景存储相关信息。与临床试验文档不同,日常常规文档具有许多影响数据质量的特征。其中一个特征是使用非标准化文本,这是临床信息系统中信息表示不可或缺的一部分。基于一项队列研究,我们强调了在半结构化数据源中挖掘针对自由文本输入字段的电子健康记录所面临的挑战。我们的原型信息提取系统在训练集上的F值为0.91(精确率=0.90,召回率=0.93),在测试集上的F值为0.90(精确率=0.89,召回率=0.92)。我们详细分析了所得结果,并总体上强调了常规数据二次使用所面临的挑战和未来方向。