Searle Stephen M J, Gilbert James, Iyer Vivek, Clamp Michele
The Wellcome Trust Sanger Institute, The Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK.
Genome Res. 2004 May;14(5):963-70. doi: 10.1101/gr.1864804.
With the completion of the human genome sequence and genome sequence available for other vertebrate genomes, the task of manual annotation at the large genome scale has become a priority. Possibly even more important, is the requirement to curate and improve this annotation in the light of future data. For this to be possible, there is a need for tools to access and manage the annotation. Ensembl provides an excellent means for storing gene structures, genome features, and sequence, but it does not support the extra textual data necessary for manual annotation. We have extended Ensembl to create the Otter manual annotation system. This comprises a relational database schema for storing the manual annotation data, an application-programming interface (API) to access it, an extensible markup language (XML) format to allow transfer of the data, and a server to allow multiuser/multimachine access to the data. We have also written a data-adaptor plugin for the Apollo Browser/Editor to enable it to utilize an Otter server. The otter database is currently used by the Vertebrate Genome Annotation (VEGA) site (http://vega.sanger.ac.uk), which provides access to manually curated human chromosomes. Support is also being developed for using the AceDB annotation editor, FMap, via a perl wrapper called Lace. The Human and Vertebrate Annotation (HAVANA) group annotators at the Sanger center are using this to annotate human chromosomes 1 and 20.
随着人类基因组序列以及其他脊椎动物基因组序列的完成,大规模基因组的手动注释任务已成为当务之急。或许更为重要的是,需要依据未来的数据对该注释进行整理和完善。要实现这一点,就需要有工具来访问和管理注释。Ensembl提供了存储基因结构、基因组特征和序列的绝佳方式,但它不支持手动注释所需的额外文本数据。我们对Ensembl进行了扩展,创建了水獭手动注释系统。该系统包括一个用于存储手动注释数据的关系数据库模式、一个用于访问它的应用程序编程接口(API)、一种允许数据传输的可扩展标记语言(XML)格式,以及一个允许多用户/多机器访问数据的服务器。我们还为Apollo浏览器/编辑器编写了一个数据适配器插件,使其能够利用水獭服务器。水獭数据库目前由脊椎动物基因组注释(VEGA)网站(http://vega.sanger.ac.uk)使用,该网站提供对人工整理的人类染色体的访问。同时,也在开发通过一个名为Lace的perl包装器使用AceDB注释编辑器FMap的支持。桑格中心的人类和脊椎动物注释(HAVANA)小组的注释人员正在使用这个系统对人类1号和20号染色体进行注释。