Stöhr Mark R, Günther Andreas, Majeed Raphael W
UGMLC, German Center for Lung Research (DZL), Justus-Liebig-University, Giessen, Germany.
Stud Health Technol Inform. 2019 Sep 3;267:230-237. doi: 10.3233/SHTI190832.
The German Center for Lung Research (DZL) is a research network with the aim of researching respiratory diseases. In order to enable consortium-wide retrospective research and prospective patient recruitment, we perform data integration into a central data warehouse. The enhancements of the underlying ontology is an ongoing process for which we developed the Collaborative Metadata Repository (CoMetaR) tool. Its technical infrastructure is based on the Resource Description Framework (RDF) for ontology representation and the distributed version control system Git for storage and versioning. Ontology development involves a considerable amount of data curation. Data provenance improves its feasibility and quality. Especially in collaborative metadata development, a comprehensive annotation about "who contributed what, when and why" is essential. Although RDF and Git versioning repositories are commonly used, no existing solution captures metadata provenance information in sufficient detail. We propose an enhanced composition of standardized RDF statements for detailed provenance representation. Additionally, we developed an algorithm that extracts and translates provenance data from the repository into the proposed RDF statements.
德国肺部研究中心(DZL)是一个旨在研究呼吸系统疾病的研究网络。为了实现全联盟范围的回顾性研究和前瞻性患者招募,我们将数据集成到一个中央数据仓库中。基础本体的增强是一个持续的过程,为此我们开发了协作元数据存储库(CoMetaR)工具。其技术基础设施基于用于本体表示的资源描述框架(RDF)以及用于存储和版本控制的分布式版本控制系统Git。本体开发涉及大量的数据编目。数据出处提高了其可行性和质量。特别是在协作元数据开发中,关于“谁在何时以及为何贡献了什么”的全面注释至关重要。尽管RDF和Git版本控制存储库被广泛使用,但现有的解决方案都没有足够详细地捕获元数据出处信息。我们提出了一种增强的标准化RDF语句组合,用于详细的出处表示。此外,我们开发了一种算法,该算法可从存储库中提取出处数据并将其转换为提议的RDF语句。