Jiang Guoqian, Solbrig Harold R, Prud'hommeaux Eric, Tao Cui, Weng Chunhua, Chute Christopher G
Department of Health Sciences Research, Mayo Clinic, Rochester, MN.
W3C/MIT, Boston, MA.
AMIA Annu Symp Proc. 2015 Nov 5;2015:659-68. eCollection 2015.
Domain-specific common data elements (CDEs) are emerging as an effective approach to standards-based clinical research data storage and retrieval. A limiting factor, however, is the lack of robust automated quality assurance (QA) tools for the CDEs in clinical study domains. The objectives of the present study are to prototype and evaluate a QA tool for the study of cancer CDEs using a post-coordination approach. The study starts by integrating the NCI caDSR CDEs and The Cancer Genome Atlas (TCGA) data dictionaries in a single Resource Description Framework (RDF) data store. We designed a compositional expression pattern based on the Data Element Concept model structure informed by ISO/IEC 11179, and developed a transformation tool that converts the pattern-based compositional expressions into the Web Ontology Language (OWL) syntax. Invoking reasoning and explanation services, we tested the system utilizing the CDEs extracted from two TCGA clinical cancer study domains. The system could automatically identify duplicate CDEs, and detect CDE modeling errors. In conclusion, compositional expressions not only enable reuse of existing ontology codes to define new domain concepts, but also provide an automated mechanism for QA of terminological annotations for CDEs.
特定领域的通用数据元素(CDEs)正成为一种基于标准的临床研究数据存储和检索的有效方法。然而,一个限制因素是临床研究领域中缺乏针对CDEs的强大自动化质量保证(QA)工具。本研究的目的是使用后协调方法为癌症CDEs研究构建一个QA工具原型并进行评估。该研究首先将美国国立癌症研究所(NCI)的癌症数据标准登记系统(caDSR)CDEs和癌症基因组图谱(TCGA)数据字典整合到一个单一的资源描述框架(RDF)数据存储中。我们基于由国际标准化组织/国际电工委员会(ISO/IEC)11179规范的数据元素概念模型结构设计了一种组合表达模式,并开发了一个转换工具,将基于模式的组合表达式转换为网络本体语言(OWL)语法。通过调用推理和解释服务,我们利用从两个TCGA临床癌症研究领域提取的CDEs对系统进行了测试。该系统能够自动识别重复的CDEs,并检测CDE建模错误。总之,组合表达式不仅能够重用现有的本体代码来定义新的领域概念,还为CDEs术语注释的质量保证提供了一种自动化机制。