Tatum Zuotian, Roos Marco, Gibson Andrew P, Taschner Peter Em, Thompson Mark, Schultes Erik A, Laros Jeroen Fj
Department of Human Genetics, Center for Human and Clinical Genetics, Leiden University Medical Center, Einthovenweg 20, 2333 ZC Leiden, the Netherlands ; Department of Rheumatology, Leiden University Medical Center, Albinusdreef 2, 2333 ZA Leiden, the Netherlands.
Department of Human Genetics, Center for Human and Clinical Genetics, Leiden University Medical Center, Einthovenweg 20, 2333 ZC Leiden, the Netherlands ; Informatics Institute of the Faculty of Science, University of Amsterdam, Science Park 904, 1098 XH Amsterdam, the Netherlands.
J Biomed Semantics. 2014 Jun 3;5(Suppl 1 Proceedings of the Bio-Ontologies Spec Interest G):S6. doi: 10.1186/2041-1480-5-S1-S6. eCollection 2014.
Matching and comparing sequence annotations of different reference sequences is vital to genomics research, yet many annotation formats do not specify the reference sequence types or versions used. This makes the integration of annotations from different sources difficult and error prone.
As part of our effort to create linked data for interoperable sequence annotations, we present an RDF data model for sequence annotation using the ontological framework established by the OBO Foundry ontologies and the Basic Formal Ontology (BFO). We defined reference sequences as the common domain of integration for sequence annotations, and identified three semantic relationships between sequence annotations. In doing so, we created the Reference Sequence Annotation to compensate for gaps in the SO and in its mapping to BFO, particularly for annotations that refer to versions of consensus reference sequences. Moreover, we present three integration models for sequence annotations using different reference assemblies.
We demonstrated a working example of a sequence annotation instance, and how this instance can be linked to other annotations on different reference sequences. Sequence annotations in this format are semantically rich and can be integrated easily with different assemblies. We also identify other challenges of modeling reference sequences with the BFO.
匹配和比较不同参考序列的序列注释对于基因组学研究至关重要,但许多注释格式并未指定所使用的参考序列类型或版本。这使得整合来自不同来源的注释变得困难且容易出错。
作为我们为可互操作的序列注释创建关联数据工作的一部分,我们使用由OBO铸造厂本体和基础形式本体(BFO)建立的本体框架,提出了一种用于序列注释的RDF数据模型。我们将参考序列定义为序列注释的公共整合域,并确定了序列注释之间的三种语义关系。在此过程中,我们创建了参考序列注释以弥补序列本体(SO)及其与BFO映射中的空白,特别是对于那些引用一致参考序列版本的注释。此外,我们提出了三种使用不同参考组装的序列注释整合模型。
我们展示了一个序列注释实例的工作示例,以及该实例如何与不同参考序列上的其他注释相链接。这种格式的序列注释语义丰富,并且可以轻松地与不同的组装进行整合。我们还确定了使用BFO对参考序列进行建模的其他挑战。