Facultad de Informática, Universidad de Murcia, Murcia, Spain.
J Med Syst. 2012 Nov;36 Suppl 1:S11-23. doi: 10.1007/s10916-012-9890-7.
Genome sequencing projects generate vast amounts of data of a wide variety of types and complexities, and at a growing pace. Traditionally, the annotation of such sequences was difficult to share with other researchers. Despite the fact that this has improved with the development and application of biological ontologies, such annotation efforts remain isolated since the amount of information that can be used from other annotation projects is limited. In addition to this, they do not benefit from the translational information available for the genomic sequences. In this paper, we describe a system that supports genome annotation processes by providing useful information about orthologous genes and the genetic disorders which can be associated with a gene identified in a sequence. The seamless integration of such data will be facilitated by an ontological infrastructure which, following best practices in ontology engineering, will reuse existing biological ontologies like Sequence Ontology or Ontological Gene Orthology.
基因组测序项目产生了大量不同类型和复杂度的数据,并且数据量还在不断增长。传统上,这些序列的注释很难与其他研究人员共享。尽管随着生物本体论的发展和应用,这种情况有所改善,但由于可以从其他注释项目中使用的信息量有限,因此这些注释工作仍然是孤立的。此外,它们也无法利用基因组序列的翻译信息。在本文中,我们描述了一个系统,该系统通过提供有关直系同源基因和与序列中识别的基因相关的遗传疾病的有用信息来支持基因组注释过程。通过本体论基础设施可以实现这种数据的无缝集成,该基础设施将遵循本体工程的最佳实践,重用现有的生物本体,如序列本体或本体基因同源性。