Department of Bioinformatics, The Mount Desert Island Biological Laboratory, Salisbury Cove, ME 04672, USA.
Database (Oxford). 2011 Sep 20;2011:bar034. doi: 10.1093/database/bar034. Print 2011.
The Comparative Toxicogenomics Database (CTD) is a public resource that promotes understanding about the effects of environmental chemicals on human health. CTD biocurators read the scientific literature and convert free-text information into a structured format using official nomenclature, integrating third party controlled vocabularies for chemicals, genes, diseases and organisms, and a novel controlled vocabulary for molecular interactions. Manual curation produces a robust, richly annotated dataset of highly accurate and detailed information. Currently, CTD describes over 349,000 molecular interactions between 6800 chemicals, 20,900 genes (for 330 organisms) and 4300 diseases that have been manually curated from over 25,400 peer-reviewed articles. This manually curated data are further integrated with other third party data (e.g. Gene Ontology, KEGG and Reactome annotations) to generate a wealth of toxicogenomic relationships. Here, we describe our approach to manual curation that uses a powerful and efficient paradigm involving mnemonic codes. This strategy allows biocurators to quickly capture detailed information from articles by generating simple statements using codes to represent the relationships between data types. The paradigm is versatile, expandable, and able to accommodate new data challenges that arise. We have incorporated this strategy into a web-based curation tool to further increase efficiency and productivity, implement quality control in real-time and accommodate biocurators working remotely. Database URL: http://ctd.mdibl.org.
比较毒理学基因组学数据库(CTD)是一个公共资源,旨在增进对环境化学物质对人类健康影响的了解。CTD 生物注释员阅读科学文献,并使用官方命名法将自由文本信息转换为结构化格式,整合化学物质、基因、疾病和生物体的第三方控制词汇表,以及用于分子相互作用的新型控制词汇表。人工注释生成了一个强大、丰富的数据集,其中包含高度准确和详细的信息。目前,CTD 从超过 25400 篇同行评审文章中手动注释了超过 349000 个化学物质、20900 个基因(来自 330 个生物体)和 4300 种疾病之间的分子相互作用。这些手动注释的数据与其他第三方数据(如基因本体论、KEGG 和 Reactome 注释)进一步整合,生成了丰富的毒理基因组学关系。在这里,我们描述了我们使用强大而高效的助记符代码的人工注释方法。这种策略允许生物注释员通过使用代码来表示不同数据类型之间的关系,快速从文章中捕获详细信息,生成简单的语句。该范式具有多功能性、可扩展性,并且能够适应新出现的数据挑战。我们已经将这种策略整合到一个基于网络的注释工具中,以进一步提高效率和生产力,实时实施质量控制,并容纳远程工作的生物注释员。数据库 URL:http://ctd.mdibl.org。