生物创意（BioCreAtIvE）概述：生物学信息提取的批判性评估

Overview of BioCreAtIvE: critical assessment of information extraction for biology.

作者信息

Hirschman Lynette, Yeh Alexander, Blaschke Christian, Valencia Alfonso

机构信息

The MITRE Corporation, 202 Burlington Road, Bedford, MA 01730, USA.

出版信息

BMC Bioinformatics. 2005;6 Suppl 1(Suppl 1):S1. doi: 10.1186/1471-2105-6-S1-S1. Epub 2005 May 24.

DOI:10.1186/1471-2105-6-S1-S1

PMID:15960821

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC1869002/

Abstract

BACKGROUND

The goal of the first BioCreAtIvE challenge (Critical Assessment of Information Extraction in Biology) was to provide a set of common evaluation tasks to assess the state of the art for text mining applied to biological problems. The results were presented in a workshop held in Granada, Spain March 28-31, 2004. The articles collected in this BMC Bioinformatics supplement entitled "A critical assessment of text mining methods in molecular biology" describe the BioCreAtIvE tasks, systems, results and their independent evaluation.

RESULTS

BioCreAtIvE focused on two tasks. The first dealt with extraction of gene or protein names from text, and their mapping into standardized gene identifiers for three model organism databases (fly, mouse, yeast). The second task addressed issues of functional annotation, requiring systems to identify specific text passages that supported Gene Ontology annotations for specific proteins, given full text articles.

CONCLUSION

The first BioCreAtIvE assessment achieved a high level of international participation (27 groups from 10 countries). The assessment provided state-of-the-art performance results for a basic task (gene name finding and normalization), where the best systems achieved a balanced 80% precision / recall or better, which potentially makes them suitable for real applications in biology. The results for the advanced task (functional annotation from free text) were significantly lower, demonstrating the current limitations of text-mining approaches where knowledge extrapolation and interpretation are required. In addition, an important contribution of BioCreAtIvE has been the creation and release of training and test data sets for both tasks. There are 22 articles in this special issue, including six that provide analyses of results or data quality for the data sets, including a novel inter-annotator consistency assessment for the test set used in task 2.

摘要

背景

第一届生物信息提取关键评估（BioCreAtIvE）挑战赛的目标是提供一组通用评估任务，以评估应用于生物学问题的文本挖掘技术的当前水平。2004年3月28日至31日在西班牙格拉纳达举办的一次研讨会上展示了相关结果。收录在这本BMC生物信息学增刊《分子生物学中文本挖掘方法的关键评估》中的文章描述了BioCreAtIvE挑战赛的任务、系统、结果及其独立评估。

结果

BioCreAtIvE挑战赛聚焦于两项任务。第一项任务涉及从文本中提取基因或蛋白质名称，并将它们映射到三个模式生物数据库（果蝇、小鼠、酵母）的标准化基因标识符。第二项任务解决功能注释问题，要求系统在给定全文文章的情况下，识别支持特定蛋白质的基因本体注释的特定文本段落。

结论

第一届BioCreAtIvE评估吸引了高水平的国际参与（来自10个国家的27个团队）。该评估为一项基础任务（基因名称查找与标准化）提供了当前的最佳性能结果，其中最佳系统实现了80%的平衡精确率/召回率或更高，这可能使其适用于生物学中的实际应用。高级任务（从自由文本中进行功能注释）的结果则低得多，这表明在需要知识外推和解释的文本挖掘方法中存在当前的局限性。此外，BioCreAtIvE的一项重要贡献是为这两项任务创建并发布了训练和测试数据集。本期特刊有22篇文章，其中6篇对数据集的结果或数据质量进行了分析，包括对任务2中使用的测试集进行的新颖的注释者间一致性评估。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e82b/1869002/61348b4942af/1471-2105-6-S1-S1-1.jpg

相似文献

Overview of BioCreAtIvE: critical assessment of information extraction for biology.

BMC Bioinformatics. 2005;6 Suppl 1(Suppl 1):S1. doi: 10.1186/1471-2105-6-S1-S1. Epub 2005 May 24.

Evaluation of text-mining systems for biology: overview of the Second BioCreative community challenge.

Genome Biol. 2008;9 Suppl 2(Suppl 2):S1. doi: 10.1186/gb-2008-9-s2-s1. Epub 2008 Sep 1.

Overview of the BioCreative III Workshop.

BMC Bioinformatics. 2011 Oct 3;12 Suppl 8(Suppl 8):S1. doi: 10.1186/1471-2105-12-S8-S1.

Evaluation of BioCreAtIvE assessment of task 2.

BMC Bioinformatics. 2005;6 Suppl 1(Suppl 1):S16. doi: 10.1186/1471-2105-6-S1-S16. Epub 2005 May 24.

An evaluation of GO annotation retrieval for BioCreAtIvE and GOA.

BMC Bioinformatics. 2005;6 Suppl 1(Suppl 1):S17. doi: 10.1186/1471-2105-6-S1-S17. Epub 2005 May 24.

BioCreative III interactive task: an overview.

BMC Bioinformatics. 2011 Oct 3;12 Suppl 8(Suppl 8):S4. doi: 10.1186/1471-2105-12-S8-S4.

An Overview of BioCreative II.5.

IEEE/ACM Trans Comput Biol Bioinform. 2010 Jul-Sep;7(3):385-99. doi: 10.1109/tcbb.2010.61.

Overview of the protein-protein interaction annotation extraction task of BioCreative II.

Genome Biol. 2008;9 Suppl 2(Suppl 2):S4. doi: 10.1186/gb-2008-9-s2-s4. Epub 2008 Sep 1.

BioCreAtIvE task 1A: gene mention finding evaluation.

BMC Bioinformatics. 2005;6 Suppl 1(Suppl 1):S2. doi: 10.1186/1471-2105-6-S1-S2. Epub 2005 May 24.

BioCreative V CDR task corpus: a resource for chemical disease relation extraction.

Database (Oxford). 2016 May 9;2016. doi: 10.1093/database/baw068. Print 2016.

引用本文的文献

Do LLMs Surpass Encoders for Biomedical NER?

Proc (IEEE Int Conf Healthc Inform). 2025 Jun;2025:352-358. doi: 10.1109/ICHI64645.2025.00048. Epub 2025 Jul 22.

Dataset from a human-in-the-loop approach to identify functionally important protein residues from literature.

Sci Data. 2024 Sep 27;11(1):1032. doi: 10.1038/s41597-024-03841-9.

Integrating deep learning architectures for enhanced biomedical relation extraction: a pipeline approach.

Database (Oxford). 2024 Aug 28;2024. doi: 10.1093/database/baae079.

Towards discovery: an end-to-end system for uncovering novel biomedical relations.

Database (Oxford). 2024 Jul 11;2024. doi: 10.1093/database/baae057.

Vocabulary Matters: An Annotation Pipeline and Four Deep Learning Algorithms for Enzyme Named Entity Recognition.

J Proteome Res. 2024 Jun 7;23(6):1915-1925. doi: 10.1021/acs.jproteome.3c00367. Epub 2024 May 11.

Overview of the COVID-19 text mining tool interactive demonstration track in BioCreative VII.

Database (Oxford). 2022 Oct 5;2022. doi: 10.1093/database/baac084.

Challenges and opportunities for mining adverse drug reactions: perspectives from pharma, regulatory agencies, healthcare providers and consumers.

Database (Oxford). 2022 Sep 2;2022. doi: 10.1093/database/baac071.

Pre-trained models, data augmentation, and ensemble learning for biomedical information extraction and document classification.

Database (Oxford). 2022 Aug 13;2022. doi: 10.1093/database/baac066.

Large-Scale Validation of Hypothesis Generation Systems via Candidate Ranking.

Proc IEEE Int Conf Big Data. 2018 Dec;2018:1494-1503. doi: 10.1109/bigdata.2018.8622637. Epub 2019 Jan 24.

A Text Mining Protocol for Mining Biological Pathways and Regulatory Networks from Biomedical Literature.

Methods Mol Biol. 2022;2496:141-157. doi: 10.1007/978-1-0716-2305-3_8.

本文引用的文献

Systematic feature evaluation for gene name recognition.

BMC Bioinformatics. 2005;6 Suppl 1(Suppl 1):S9. doi: 10.1186/1471-2105-6-S1-S9. Epub 2005 May 24.

Gene/protein name recognition based on support vector machine using dictionary as features.

BMC Bioinformatics. 2005;6 Suppl 1(Suppl 1):S8. doi: 10.1186/1471-2105-6-S1-S8. Epub 2005 May 24.

Recognition of protein/gene names from text using an ensemble of classifiers.

BMC Bioinformatics. 2005;6 Suppl 1(Suppl 1):S7. doi: 10.1186/1471-2105-6-S1-S7. Epub 2005 May 24.

Identifying gene and protein mentions in text using conditional random fields.

BMC Bioinformatics. 2005;6 Suppl 1(Suppl 1):S6. doi: 10.1186/1471-2105-6-S1-S6. Epub 2005 May 24.

Exploring the boundaries: gene and protein identification in biomedical text.

BMC Bioinformatics. 2005;6 Suppl 1(Suppl 1):S5. doi: 10.1186/1471-2105-6-S1-S5. Epub 2005 May 24.

BioCreAtIvE task1A: entity identification with a stochastic tagger.

BMC Bioinformatics. 2005;6 Suppl 1(Suppl 1):S4. doi: 10.1186/1471-2105-6-S1-S4. Epub 2005 May 24.

GENETAG: a tagged corpus for gene/protein named entity recognition.

BMC Bioinformatics. 2005;6 Suppl 1(Suppl 1):S3. doi: 10.1186/1471-2105-6-S1-S3. Epub 2005 May 24.

Data-poor categorization and passage retrieval for gene ontology annotation in Swiss-Prot.

BMC Bioinformatics. 2005;6 Suppl 1(Suppl 1):S23. doi: 10.1186/1471-2105-6-S1-S23. Epub 2005 May 24.

Mining protein function from text using term-based support vector machines.

BMC Bioinformatics. 2005;6 Suppl 1(Suppl 1):S22. doi: 10.1186/1471-2105-6-S1-S22. Epub 2005 May 24.

Finding genomic ontology terms in text using evidence content.

BMC Bioinformatics. 2005;6 Suppl 1(Suppl 1):S21. doi: 10.1186/1471-2105-6-S1-S21. Epub 2005 May 24.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

生物创意（BioCreAtIvE）概述：生物学信息提取的批判性评估

Overview of BioCreAtIvE: critical assessment of information extraction for biology.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSION

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献