Suppr超能文献

比较毒理学基因组学数据库中用于科学文献人工注释的注释范例和应用工具。

The curation paradigm and application tool used for manual curation of the scientific literature at the Comparative Toxicogenomics Database.

机构信息

Department of Bioinformatics, The Mount Desert Island Biological Laboratory, Salisbury Cove, ME 04672, USA.

出版信息

Database (Oxford). 2011 Sep 20;2011:bar034. doi: 10.1093/database/bar034. Print 2011.

Abstract

The Comparative Toxicogenomics Database (CTD) is a public resource that promotes understanding about the effects of environmental chemicals on human health. CTD biocurators read the scientific literature and convert free-text information into a structured format using official nomenclature, integrating third party controlled vocabularies for chemicals, genes, diseases and organisms, and a novel controlled vocabulary for molecular interactions. Manual curation produces a robust, richly annotated dataset of highly accurate and detailed information. Currently, CTD describes over 349,000 molecular interactions between 6800 chemicals, 20,900 genes (for 330 organisms) and 4300 diseases that have been manually curated from over 25,400 peer-reviewed articles. This manually curated data are further integrated with other third party data (e.g. Gene Ontology, KEGG and Reactome annotations) to generate a wealth of toxicogenomic relationships. Here, we describe our approach to manual curation that uses a powerful and efficient paradigm involving mnemonic codes. This strategy allows biocurators to quickly capture detailed information from articles by generating simple statements using codes to represent the relationships between data types. The paradigm is versatile, expandable, and able to accommodate new data challenges that arise. We have incorporated this strategy into a web-based curation tool to further increase efficiency and productivity, implement quality control in real-time and accommodate biocurators working remotely. Database URL: http://ctd.mdibl.org.

摘要

比较毒理学基因组学数据库(CTD)是一个公共资源,旨在增进对环境化学物质对人类健康影响的了解。CTD 生物注释员阅读科学文献,并使用官方命名法将自由文本信息转换为结构化格式,整合化学物质、基因、疾病和生物体的第三方控制词汇表,以及用于分子相互作用的新型控制词汇表。人工注释生成了一个强大、丰富的数据集,其中包含高度准确和详细的信息。目前,CTD 从超过 25400 篇同行评审文章中手动注释了超过 349000 个化学物质、20900 个基因(来自 330 个生物体)和 4300 种疾病之间的分子相互作用。这些手动注释的数据与其他第三方数据(如基因本体论、KEGG 和 Reactome 注释)进一步整合,生成了丰富的毒理基因组学关系。在这里,我们描述了我们使用强大而高效的助记符代码的人工注释方法。这种策略允许生物注释员通过使用代码来表示不同数据类型之间的关系,快速从文章中捕获详细信息,生成简单的语句。该范式具有多功能性、可扩展性,并且能够适应新出现的数据挑战。我们已经将这种策略整合到一个基于网络的注释工具中,以进一步提高效率和生产力,实时实施质量控制,并容纳远程工作的生物注释员。数据库 URL:http://ctd.mdibl.org。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d5b5/3176677/6a6afc32be28/bar034f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验