协作生物注释——用于文档优先级排序的文本挖掘开发任务，以便进行注释。

Collaborative biocuration--text-mining development task for document prioritization for curation.

机构信息

Department of Biology, North Carolina State University, Raleigh, NC 27695-7617, USA.

出版信息

Database (Oxford). 2012 Nov 22;2012:bas037. doi: 10.1093/database/bas037. Print 2012.

DOI:10.1093/database/bas037

PMID:23180769

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3504477/

Abstract

The Critical Assessment of Information Extraction systems in Biology (BioCreAtIvE) challenge evaluation is a community-wide effort for evaluating text mining and information extraction systems for the biological domain. The 'BioCreative Workshop 2012' subcommittee identified three areas, or tracks, that comprised independent, but complementary aspects of data curation in which they sought community input: literature triage (Track I); curation workflow (Track II) and text mining/natural language processing (NLP) systems (Track III). Track I participants were invited to develop tools or systems that would effectively triage and prioritize articles for curation and present results in a prototype web interface. Training and test datasets were derived from the Comparative Toxicogenomics Database (CTD; http://ctdbase.org) and consisted of manuscripts from which chemical-gene-disease data were manually curated. A total of seven groups participated in Track I. For the triage component, the effectiveness of participant systems was measured by aggregate gene, disease and chemical 'named-entity recognition' (NER) across articles; the effectiveness of 'information retrieval' (IR) was also measured based on 'mean average precision' (MAP). Top recall scores for gene, disease and chemical NER were 49, 65 and 82%, respectively; the top MAP score was 80%. Each participating group also developed a prototype web interface; these interfaces were evaluated based on functionality and ease-of-use by CTD's biocuration project manager. In this article, we present a detailed description of the challenge and a summary of the results.

摘要

生物信息提取系统的关键评估（BioCreAtIvE）挑战评估是一个社区范围内的努力，旨在评估生物领域的文本挖掘和信息提取系统。“2012 年 BioCreative 研讨会”小组委员会确定了三个领域，或跟踪，它们包含了数据管理的独立但互补的方面，他们寻求社区的投入：文献分类（Track I）；策展工作流程（Track II）和文本挖掘/自然语言处理（NLP）系统（Track III）。邀请 Track I 的参与者开发工具或系统，以便有效地对文章进行分类和优先级排序，以便在原型 Web 界面中呈现结果。培训和测试数据集源自比较毒理学基因组数据库（CTD；http://ctdbase.org），由手动编辑化学-基因-疾病数据的手稿组成。共有七个小组参加了 Track I。对于分类组件，参与者系统的有效性通过文章中基因、疾病和化学“命名实体识别”（NER）的综合评估来衡量；基于“平均精度”（MAP）也衡量了“信息检索”（IR）的有效性。基因、疾病和化学 NER 的最高召回分数分别为 49%、65%和 82%；最高的 MAP 分数为 80%。每个参与小组还开发了一个原型 Web 界面；根据 CTD 的生物策展项目经理的功能和易用性对这些界面进行了评估。在本文中，我们详细介绍了挑战并总结了结果。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd5e/3504477/f8493a4a66e3/bas037f1p.jpg

相似文献

Collaborative biocuration--text-mining development task for document prioritization for curation.协作生物注释——用于文档优先级排序的文本挖掘开发任务，以便进行注释。

Database (Oxford). 2012 Nov 22;2012:bas037. doi: 10.1093/database/bas037. Print 2012.

Web services-based text-mining demonstrates broad impacts for interoperability and process simplification.基于网络服务的文本挖掘对互操作性和流程简化具有广泛影响。

Database (Oxford). 2014 Jun 10;2014. doi: 10.1093/database/bau050. Print 2014.

Biocuration workflows and text mining: overview of the BioCreative 2012 Workshop Track II.生物信息学工作流程和文本挖掘：BioCreative 2012 研讨会第二轨道概述。

Database (Oxford). 2012 Nov 17;2012:bas043. doi: 10.1093/database/bas043. Print 2012.

Text mining effectively scores and ranks the literature for improving chemical-gene-disease curation at the comparative toxicogenomics database.文本挖掘有效地对文献进行评分和排序，以提高比较毒理学基因组学数据库中的化学物质-基因-疾病的编纂工作。

PLoS One. 2013 Apr 17;8(4):e58201. doi: 10.1371/journal.pone.0058201. Print 2013.

BioCreative III interactive task: an overview.BioCreative III 交互式任务概述。

BMC Bioinformatics. 2011 Oct 3;12 Suppl 8(Suppl 8):S4. doi: 10.1186/1471-2105-12-S8-S4.

An overview of the BioCreative 2012 Workshop Track III: interactive text mining task.BioCreative 2012 研讨会第三轨道：交互式文本挖掘任务概述。

Database (Oxford). 2013 Jan 17;2013:bas056. doi: 10.1093/database/bas056. Print 2013.

Using binary classification to prioritize and curate articles for the Comparative Toxicogenomics Database.使用二进制分类对比较毒理学基因组学数据库中的文章进行优先级排序和精选。

Database (Oxford). 2012 Dec 5;2012:bas050. doi: 10.1093/database/bas050. Print 2012.

Using the OntoGene pipeline for the triage task of BioCreative 2012.使用 OntoGene 流水线进行 BioCreative 2012 的分诊任务。

Database (Oxford). 2013 Feb 9;2013:bas053. doi: 10.1093/database/bas053. Print 2013.

Overview of the BioCreative VI Precision Medicine Track: mining protein interactions and mutations for precision medicine.BioCreative VI 精准医学赛道概述：精准医学中的蛋白质相互作用和突变挖掘。

Database (Oxford). 2019 Jan 1;2019:bay147. doi: 10.1093/database/bay147.

Text mining and manual curation of chemical-gene-disease networks for the comparative toxicogenomics database (CTD).文本挖掘和化学-基因-疾病网络的人工整理用于比较毒理学基因组数据库（CTD）。

BMC Bioinformatics. 2009 Oct 8;10:326. doi: 10.1186/1471-2105-10-326.

引用本文的文献

Integrating AI-powered text mining from PubTator into the manual curation workflow at the Comparative Toxicogenomics Database.将来自PubTator的人工智能文本挖掘技术整合到比较毒理基因组学数据库的人工编目工作流程中。

Database (Oxford). 2025 Feb 21;2025. doi: 10.1093/database/baaf013.

Biomedical literature mining: graph kernel-based learning for gene-gene interaction extraction.生物医学文献挖掘：基于图核的基因-基因相互作用提取的学习方法。

Eur J Med Res. 2024 Aug 2;29(1):404. doi: 10.1186/s40001-024-01983-5.

A High Recall Classifier for Selecting Articles for MEDLINE Indexing.一种用于为MEDLINE索引选择文章的高召回率分类器。

AMIA Annu Symp Proc. 2020 Mar 4;2019:727-734. eCollection 2019.

PGxCorpus, a manually annotated corpus for pharmacogenomics.PGxCorpus，一个用于药物基因组学的人工标注语料库。

Sci Data. 2020 Jan 2;7(1):3. doi: 10.1038/s41597-019-0342-9.

Automated assessment of biological database assertions using the scientific literature.利用科学文献自动评估生物数据库断言。

BMC Bioinformatics. 2019 Apr 29;20(1):216. doi: 10.1186/s12859-019-2801-x.

Assisting document triage for human kinome curation via machine learning.通过机器学习辅助人类蛋白质组激酶结构域注释的文档分类。

Database (Oxford). 2018 Jan 1;2018:bay091. doi: 10.1093/database/bay091.

Text Mining for Precision Medicine: Bringing Structure to EHRs and Biomedical Literature to Understand Genes and Health.精准医学的文本挖掘：为电子健康记录和生物医学文献构建结构以理解基因与健康。

Adv Exp Med Biol. 2016;939:139-166. doi: 10.1007/978-981-10-1503-8_7.

The Comparative Toxicogenomics Database: update 2017.比较毒理基因组学数据库：2017年更新版

Nucleic Acids Res. 2017 Jan 4;45(D1):D972-D978. doi: 10.1093/nar/gkw838. Epub 2016 Sep 19.

Chemical-induced disease relation extraction with various linguistic features.基于多种语言特征的化学诱导疾病关系提取

Database (Oxford). 2016 Apr 6;2016. doi: 10.1093/database/baw042. Print 2016.

CD-REST: a system for extracting chemical-induced disease relation in literature.CD-REST：一种用于从文献中提取化学物质诱发疾病关系的系统。

Database (Oxford). 2016 Mar 25;2016. doi: 10.1093/database/baw036. Print 2016.

本文引用的文献

MEDIC: a practical disease vocabulary used at the Comparative Toxicogenomics Database.医学：比较毒理学基因组学数据库中使用的实用疾病词汇。

Database (Oxford). 2012 Mar 20;2012:bar065. doi: 10.1093/database/bar065. Print 2012.

OSCAR4: a flexible architecture for chemical text-mining.OSCAR4：一种用于化学文本挖掘的灵活架构。

J Cheminform. 2011 Oct 14;3(1):41. doi: 10.1186/1758-2946-3-41.

The curation paradigm and application tool used for manual curation of the scientific literature at the Comparative Toxicogenomics Database.比较毒理学基因组学数据库中用于科学文献人工注释的注释范例和应用工具。

Database (Oxford). 2011 Sep 20;2011:bar034. doi: 10.1093/database/bar034. Print 2011.

NORMA-Gene: a simple and robust method for qPCR normalization based on target gene data.NORMA-Gene：一种基于靶基因数据的简单而稳健的 qPCR 归一化方法。

BMC Bioinformatics. 2011 Jun 21;12:250. doi: 10.1186/1471-2105-12-250.

A new face and new challenges for Online Mendelian Inheritance in Man (OMIM®).在线孟德尔遗传数据库（OMIM®）迎来新面貌与新挑战。

Hum Mutat. 2011 May;32(5):564-7. doi: 10.1002/humu.21466. Epub 2011 Apr 5.

Entrez Gene: gene-centered information at NCBI.Entrez基因：美国国立医学图书馆国家生物技术信息中心的基因中心信息。

Nucleic Acids Res. 2011 Jan;39(Database issue):D52-7. doi: 10.1093/nar/gkq1237. Epub 2010 Nov 28.

The Comparative Toxicogenomics Database: update 2011.比较毒理基因组学数据库：2011年更新版

Nucleic Acids Res. 2011 Jan;39(Database issue):D1067-72. doi: 10.1093/nar/gkq813. Epub 2010 Sep 22.

BMC Bioinformatics. 2009 Oct 8;10:326. doi: 10.1186/1471-2105-10-326.

Cascaded classifiers for confidence-based chemical named entity recognition.用于基于置信度的化学命名实体识别的级联分类器

BMC Bioinformatics. 2008 Nov 19;9 Suppl 11(Suppl 11):S4. doi: 10.1186/1471-2105-9-S11-S4.

Comparative Toxicogenomics Database: a knowledgebase and discovery tool for chemical-gene-disease networks.比较毒理基因组学数据库：一个关于化学物质-基因-疾病网络的知识库和发现工具。

Nucleic Acids Res. 2009 Jan;37(Database issue):D786-92. doi: 10.1093/nar/gkn580. Epub 2008 Sep 9.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

协作生物注释——用于文档优先级排序的文本挖掘开发任务，以便进行注释。

Collaborative biocuration--text-mining development task for document prioritization for curation.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献