用于提高数据质量的带注释数据模型分析

Analysis of Annotated Data Models for Improving Data Quality.

作者信息

Ulrich Hannes, Kock-Schoppenhauer Ann-Kristin, Andersen Björn, Ingenerf Josef

机构信息

IT for Clinical Research, Lübeck (ITCR-L), University of Lübeck, Germany.

Institute of Medical Informatics, University of Lübeck, Germany.

出版信息

Stud Health Technol Inform. 2017;243:190-194.

PMID:28883198

Abstract

The public Medical Data Models (MDM) portal with more than 9.000 annotated forms from clinical trials and other sources provides many research opportunities for the medical informatics community. It is mainly used to address the problem of heterogeneity by searching, mediating, reusing, and assessing data models, e. g. the semi-interactive curation of core data records in a special domain. Furthermore, it can be used as a benchmark for evaluating algorithms that create, transform, annotate, and analyse structured patient data. Using CDISC ODM for syntactically representing all data models in the MDM portal, there are semi-automatically added UMLS CUIs at several ODM levels like ItemGroupDef, ItemDef, or CodeList item. This can improve the interpretability and processability of the received information, but only if the coded information is correct and reliable. This raises the question how to assure that semantically similar datasets are also processed and classified similarly. In this work, a (semi-)automatic approach to analyse and assess items, questions, and data elements in clinical studies is described. The approach uses a hybrid evaluation process to rate and propose semantic annotations for under-specified trial items. The evaluation algorithm operates with the commonly used NLM MetaMap to provide UMLS support and corpus-based proposal algorithms to link datasets from the provided CDISC ODM item pool.

摘要

公共医学数据模型（MDM）门户拥有来自临床试验和其他来源的9000多种带注释的表单，为医学信息学界提供了许多研究机会。它主要用于通过搜索、调解、重用和评估数据模型来解决异构性问题，例如在特定领域对核心数据记录进行半交互式管理。此外，它还可以用作评估创建、转换、注释和分析结构化患者数据的算法的基准。使用CDISC ODM在语法上表示MDM门户中的所有数据模型，在ItemGroupDef、ItemDef或CodeList item等几个ODM级别会半自动添加UMLS CUI。这可以提高所接收信息的可解释性和可处理性，但前提是编码信息是正确且可靠的。这就提出了一个问题，即如何确保语义相似的数据集也能得到类似的处理和分类。在这项工作中，描述了一种（半）自动方法来分析和评估临床研究中的项目、问题和数据元素。该方法使用混合评估过程对指定不足的试验项目进行评分并提出语义注释。评估算法使用常用的NLM MetaMap来提供UMLS支持，并使用基于语料库的提议算法来链接来自提供的CDISC ODM项目池的数据集。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

用于提高数据质量的带注释数据模型分析

Analysis of Annotated Data Models for Improving Data Quality.

作者信息

机构信息

出版信息

相似文献

用于提高数据质量的带注释数据模型分析

Analysis of Annotated Data Models for Improving Data Quality.

作者信息

机构信息

出版信息

相似文献