保留跨参考序列的序列注释。

Preserving sequence annotations across reference sequences.

作者信息

Tatum Zuotian, Roos Marco, Gibson Andrew P, Taschner Peter Em, Thompson Mark, Schultes Erik A, Laros Jeroen Fj

机构信息

Department of Human Genetics, Center for Human and Clinical Genetics, Leiden University Medical Center, Einthovenweg 20, 2333 ZC Leiden, the Netherlands ; Department of Rheumatology, Leiden University Medical Center, Albinusdreef 2, 2333 ZA Leiden, the Netherlands.

Department of Human Genetics, Center for Human and Clinical Genetics, Leiden University Medical Center, Einthovenweg 20, 2333 ZC Leiden, the Netherlands ; Informatics Institute of the Faculty of Science, University of Amsterdam, Science Park 904, 1098 XH Amsterdam, the Netherlands.

出版信息

J Biomed Semantics. 2014 Jun 3;5(Suppl 1 Proceedings of the Bio-Ontologies Spec Interest G):S6. doi: 10.1186/2041-1480-5-S1-S6. eCollection 2014.

DOI:10.1186/2041-1480-5-S1-S6

PMID:25093075

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4108922/

Abstract

BACKGROUND

Matching and comparing sequence annotations of different reference sequences is vital to genomics research, yet many annotation formats do not specify the reference sequence types or versions used. This makes the integration of annotations from different sources difficult and error prone.

RESULTS

As part of our effort to create linked data for interoperable sequence annotations, we present an RDF data model for sequence annotation using the ontological framework established by the OBO Foundry ontologies and the Basic Formal Ontology (BFO). We defined reference sequences as the common domain of integration for sequence annotations, and identified three semantic relationships between sequence annotations. In doing so, we created the Reference Sequence Annotation to compensate for gaps in the SO and in its mapping to BFO, particularly for annotations that refer to versions of consensus reference sequences. Moreover, we present three integration models for sequence annotations using different reference assemblies.

CONCLUSIONS

We demonstrated a working example of a sequence annotation instance, and how this instance can be linked to other annotations on different reference sequences. Sequence annotations in this format are semantically rich and can be integrated easily with different assemblies. We also identify other challenges of modeling reference sequences with the BFO.

摘要

背景

匹配和比较不同参考序列的序列注释对于基因组学研究至关重要，但许多注释格式并未指定所使用的参考序列类型或版本。这使得整合来自不同来源的注释变得困难且容易出错。

结果

作为我们为可互操作的序列注释创建关联数据工作的一部分，我们使用由OBO铸造厂本体和基础形式本体（BFO）建立的本体框架，提出了一种用于序列注释的RDF数据模型。我们将参考序列定义为序列注释的公共整合域，并确定了序列注释之间的三种语义关系。在此过程中，我们创建了参考序列注释以弥补序列本体（SO）及其与BFO映射中的空白，特别是对于那些引用一致参考序列版本的注释。此外，我们提出了三种使用不同参考组装的序列注释整合模型。