Suppr超能文献

支持在 PANGAEA 中对环境科学数据进行术语归档和发布。

Terminology supported archiving and publication of environmental science data in PANGAEA.

机构信息

PANGAEA Data Publisher for Earth & Environmental Science, MARUM Center for Marine Environmental Sciences, University of Bremen, P.O. Box 33 04 40, 28334 Bremen, Germany.

PANGAEA Data Publisher for Earth & Environmental Science, MARUM Center for Marine Environmental Sciences, University of Bremen, P.O. Box 33 04 40, 28334 Bremen, Germany.

出版信息

J Biotechnol. 2017 Nov 10;261:177-186. doi: 10.1016/j.jbiotec.2017.07.016. Epub 2017 Jul 23.

Abstract

Exemplified on the information system PANGAEA, we describe the application of terminologies for archiving and publishing environmental science data. A terminology catalogue (TC) was embedded into the system, with interfaces allowing to replicate and to manually work on terminologies. For data ingest and archiving, we show how the TC can improve structuring and harmonizing lineage and content descriptions of data sets. Key is the conceptualization of measurement and observation types (parameters) and methods, for which we have implemented a basic syntax and rule set. For data access and dissemination, we have improved findability of data through enrichment of metadata with TC terms. Semantic annotations, e.g. adding term concepts (including synonyms and hierarchies) or mapped terms of different terminologies, facilitate comprehensive data retrievals. The PANGAEA thesaurus of classifying terms, which is part of the TC is used as an umbrella vocabulary that links the various domains and allows drill downs and side drills with various facets. Furthermore, we describe how TC terms can be linked to nominal data values. This improves data harmonization and facilitates structural transformation of heterogeneous data sets to a common schema. Technical developments are complemented by work on the metadata content. Over the last 20 years, more than 100 new parameters have been defined on average per week. Recently, PANGAEA has increasingly been submitting new terms to various terminology services. Matching terms from terminology services with our parameter or method strings is supported programmatically. However, the process ultimately needs manual input by domain experts. The quality of terminology services is an additional limiting factor, and varies with respect to content, editorial, interoperability, and sustainability. Good quality terminology services are the building blocks for the conceptualization of parameters and methods. In our view, they are essential for data interoperability and arguably the most difficult hurdle for data integration. In summary, the application of terminologies has a mutual positive effect for terminology services and information systems such as PANGAEA. On both sides, the application of terminologies improves content, reliability and interoperability.

摘要

以信息系统 PANGAEA 为例,我们描述了术语在环境科学数据归档和发布中的应用。术语目录 (TC) 被嵌入到系统中,并提供了接口来复制和手动处理术语。对于数据摄入和归档,我们展示了 TC 如何改善数据集的谱系和内容描述的结构和协调。关键是测量和观察类型(参数)和方法的概念化,为此我们实现了基本语法和规则集。对于数据访问和分发,我们通过用 TC 术语丰富元数据来提高数据的可发现性。语义注释,例如添加术语概念(包括同义词和层次结构)或不同术语的映射术语,有助于全面检索数据。TC 术语分类词库是 TC 的一部分,用作链接各个领域的总括词汇表,并允许使用各种方面进行向下钻取和侧面钻取。此外,我们描述了如何将 TC 术语链接到标称数据值。这提高了数据的协调性,并促进了异构数据集向通用模式的结构转换。技术发展与元数据内容的工作相辅相成。在过去的 20 年中,平均每周都会定义 100 多个新参数。最近,PANGEA 越来越多地向各种术语服务提交新术语。支持通过编程匹配术语服务和我们的参数或方法字符串的术语。然而,这个过程最终需要领域专家的人工输入。术语服务的质量是一个额外的限制因素,并且在内容、编辑、互操作性和可持续性方面存在差异。高质量的术语服务是参数和方法概念化的基础。在我们看来,它们是数据互操作性的基础,并且可以说是数据集成最困难的障碍。总之,术语的应用对术语服务和 PANGAEA 等信息系统具有相互积极的影响。在这两个方面,术语的应用都提高了内容、可靠性和互操作性。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验