The Jackson Laboratory, 600 Main Street, Bar Harbor, ME 04609, USA.
Database (Oxford). 2012 Oct 29;2012:bas045. doi: 10.1093/database/bas045. Print 2012.
The Mouse Genome Database, the Gene Expression Database and the Mouse Tumor Biology database are integrated components of the Mouse Genome Informatics (MGI) resource (http://www.informatics.jax.org). The MGI system presents both a consensus view and an experimental view of the knowledge concerning the genetics and genomics of the laboratory mouse. From genotype to phenotype, this information resource integrates information about genes, sequences, maps, expression analyses, alleles, strains and mutant phenotypes. Comparative mammalian data are also presented particularly in regards to the use of the mouse as a model for the investigation of molecular and genetic components of human diseases. These data are collected from literature curation as well as downloads of large datasets (SwissProt, LocusLink, etc.). MGI is one of the founding members of the Gene Ontology (GO) and uses the GO for functional annotation of genes. Here, we discuss the workflow associated with manual GO annotation at MGI, from literature collection to display of the annotations. Peer-reviewed literature is collected mostly from a set of journals available electronically. Selected articles are entered into a master bibliography and indexed to one of eight areas of interest such as 'GO' or 'homology' or 'phenotype'. Each article is then either indexed to a gene already contained in the database or funneled through a separate nomenclature database to add genes. The master bibliography and associated indexing provide information for various curator-reports such as 'papers selected for GO that refer to genes with NO GO annotation'. Once indexed, curators who have expertise in appropriate disciplines enter pertinent information. MGI makes use of several controlled vocabularies that ensure uniform data encoding, enable robust analysis and support the construction of complex queries. These vocabularies range from pick-lists to structured vocabularies such as the GO. All data associations are supported with statements of evidence as well as access to source publications.
小鼠基因组数据库、基因表达数据库和小鼠肿瘤生物学数据库是小鼠基因组信息学 (MGI) 资源(http://www.informatics.jax.org)的集成组件。MGI 系统提供了实验室小鼠遗传学和基因组学知识的共识视图和实验视图。从基因型到表型,这个信息资源整合了有关基因、序列、图谱、表达分析、等位基因、品系和突变表型的信息。比较哺乳动物的数据也特别呈现,特别是关于将小鼠用作研究人类疾病分子和遗传成分的模型。这些数据是从文献整理以及大型数据集(SwissProt、LocusLink 等)的下载中收集的。MGI 是基因本体论 (GO) 的创始成员之一,并使用 GO 对基因进行功能注释。在这里,我们讨论了在 MGI 进行手动 GO 注释的工作流程,从文献收集到注释显示。同行评议的文献主要从一组可在线获取的期刊中收集。选定的文章被输入到一个主参考书目,并索引到八个感兴趣的领域之一,例如“GO”或“同源性”或“表型”。然后,每篇文章要么被索引到数据库中已经包含的基因,要么通过单独的命名法数据库添加基因。主参考书目和相关索引为各种编目报告提供信息,例如“选择用于 GO 的论文,这些论文涉及没有 GO 注释的基因”。索引后,具有相关专业知识的编目人员会输入相关信息。MGI 利用了几个受控词汇表,这些词汇表确保了数据编码的一致性,支持强大的分析并支持构建复杂的查询。这些词汇表从选择列表到 GO 等结构化词汇表不等。所有数据关联都有证据陈述以及对源出版物的访问支持。