Department of Basic Sciences, College of Veterinary Medicine, Mississippi State University, MS 39762, USA.
Database (Oxford). 2012 Nov 17;2012:bas038. doi: 10.1093/database/bas038. Print 2012.
AgBase provides annotation for agricultural gene products using the Gene Ontology (GO) and Plant Ontology, as appropriate. Unlike model organism species, agricultural species have a body of literature that does not just focus on gene function; to improve efficiency, we use text mining to identify literature for curation. The first component of our annotation interface is the gene prioritization interface that ranks gene products for annotation. Biocurators select the top-ranked gene and mark annotation for these genes as 'in progress' or 'completed'; links enable biocurators to move directly to our biocuration interface (BI). Our BI includes all current GO annotation for gene products and is the main interface to add/modify AgBase curation data. The BI also displays Extracting Genic Information from Text (eGIFT) results for each gene product. eGIFT is a web-based, text-mining tool that associates ranked, informative terms (iTerms) and the articles and sentences containing them, with genes. Moreover, iTerms are linked to GO terms, where they match either a GO term name or a synonym. This enables AgBase biocurators to rapidly identify literature for further curation based on possible GO terms. Because most agricultural species do not have standardized literature, eGIFT searches all gene names and synonyms to associate articles with genes. As many of the gene names can be ambiguous, eGIFT applies a disambiguation step to remove matches that do not correspond to this gene, and filtering is applied to remove abstracts that mention a gene in passing. The BI is linked to our Journal Database (JDB) where corresponding journal citations are stored. Just as importantly, biocurators also add to the JDB citations that have no GO annotation. The AgBase BI also supports bulk annotation upload to facilitate our Inferred from electronic annotation of agricultural gene products. All annotations must pass standard GO Consortium quality checking before release in AgBase. Database URL: http://www.agbase.msstate.edu/.
AgBase 使用基因本体论 (GO) 和植物本体论 (Plant Ontology) 为农业基因产物提供注释。与模式生物物种不同,农业物种的文献不仅关注基因功能;为了提高效率,我们使用文本挖掘来识别供策展的文献。我们的注释界面的第一个组件是基因优先级界面,用于对基因产物进行排序。生物注释员选择排名最高的基因,并将这些基因的注释标记为“正在进行”或“已完成”;链接使生物注释员能够直接转到我们的生物注释界面 (BI)。我们的 BI 包含所有当前的基因产物 GO 注释,是添加/修改 AgBase 策展数据的主要界面。BI 还显示每个基因产物的从文本中提取基因信息 (eGIFT) 结果。eGIFT 是一个基于网络的文本挖掘工具,它将排名靠前、信息丰富的术语 (iTerms) 及其包含的文章和句子与基因相关联。此外,iTerms 与 GO 术语相关联,在这些术语中,它们与 GO 术语名称或同义词匹配。这使 AgBase 生物注释员能够根据可能的 GO 术语快速识别进一步策展的文献。由于大多数农业物种没有标准化的文献,eGIFT 会搜索所有的基因名称和同义词,将文章与基因关联起来。由于许多基因名称可能存在歧义,eGIFT 应用消歧步骤来删除与该基因不对应的匹配,并应用过滤来删除仅顺带提及基因的摘要。BI 与我们的期刊数据库 (JDB) 相关联,其中存储了相应的期刊引文。同样重要的是,生物注释员还会向 JDB 添加没有 GO 注释的引文。AgBase BI 还支持批量注释上传,以促进我们从农业基因产物的电子注释中推断出来的注释。所有注释都必须通过标准的 GO 联盟质量检查,然后才能在 AgBase 中发布。数据库网址:http://www.agbase.msstate.edu/。