Laboratory of Systems and Synthetic Biology, Wageningen University & Research, Wageningen, 6708 WE, The Netherlands.
Centre for Integrative Genetics (CIGENE), Department of Animal and Aquacultural Sciences (IHA), Faculty of Life Sciences (BIOVIT), Norwegian University of Life Sciences (NMBU), PO Box 5003, Ås, Norway.
Sci Data. 2019 Nov 4;6(1):254. doi: 10.1038/s41597-019-0263-7.
The RDF data model facilitates integration of diverse data available in structured and semi-structured formats. To obtain a coherent RDF graph the chosen ontology must be consistently applied. However, addition of new diverse data causes the ontology to evolve, which could lead to accumulation of unintended erroneous composites. Thus, there is a need for a gate keeping system that compares the intended content described in the ontology with the actual content of the resource. The Empusa code generator facilitates creation of composite RDF resources from disparate sources. Empusa can convert a schema into an associated application programming interface (API), that can be used to perform data consistency checks and generates Markdown documentation to make persistent URLs resolvable. Using Empusa consistency is ensured within and between the ontology and the content of the resource. As an illustration of the potential of Empusa, we present the Genome Biology Ontology Language (GBOL). GBOL uses and extends current ontologies to provide a formal representation of genomic entities, along with their properties, relations and provenance.
RDF 数据模型有助于集成以结构化和半结构化格式提供的各种数据。为了获得一致的 RDF 图,必须一致应用所选的本体。然而,新的异类数据的添加会导致本体的演变,这可能导致意外错误的组合的积累。因此,需要一个门控系统,该系统将本体中描述的预期内容与资源的实际内容进行比较。Empusa 代码生成器有助于从不同来源创建复合 RDF 资源。Empusa 可以将模式转换为关联的应用程序编程接口 (API),该 API 可用于执行数据一致性检查并生成 Markdown 文档,以使持久 URL 可解析。使用 Empusa 可以确保本体和资源内容内部以及之间的一致性。作为 Empusa 潜力的一个例证,我们展示了基因组生物学本体语言 (GBOL)。GBOL 使用和扩展了当前的本体,为基因组实体及其属性、关系和出处提供了正式的表示。