Hirschman Lynette, Yeh Alexander, Blaschke Christian, Valencia Alfonso
The MITRE Corporation, 202 Burlington Road, Bedford, MA 01730, USA.
BMC Bioinformatics. 2005;6 Suppl 1(Suppl 1):S1. doi: 10.1186/1471-2105-6-S1-S1. Epub 2005 May 24.
The goal of the first BioCreAtIvE challenge (Critical Assessment of Information Extraction in Biology) was to provide a set of common evaluation tasks to assess the state of the art for text mining applied to biological problems. The results were presented in a workshop held in Granada, Spain March 28-31, 2004. The articles collected in this BMC Bioinformatics supplement entitled "A critical assessment of text mining methods in molecular biology" describe the BioCreAtIvE tasks, systems, results and their independent evaluation.
BioCreAtIvE focused on two tasks. The first dealt with extraction of gene or protein names from text, and their mapping into standardized gene identifiers for three model organism databases (fly, mouse, yeast). The second task addressed issues of functional annotation, requiring systems to identify specific text passages that supported Gene Ontology annotations for specific proteins, given full text articles.
The first BioCreAtIvE assessment achieved a high level of international participation (27 groups from 10 countries). The assessment provided state-of-the-art performance results for a basic task (gene name finding and normalization), where the best systems achieved a balanced 80% precision / recall or better, which potentially makes them suitable for real applications in biology. The results for the advanced task (functional annotation from free text) were significantly lower, demonstrating the current limitations of text-mining approaches where knowledge extrapolation and interpretation are required. In addition, an important contribution of BioCreAtIvE has been the creation and release of training and test data sets for both tasks. There are 22 articles in this special issue, including six that provide analyses of results or data quality for the data sets, including a novel inter-annotator consistency assessment for the test set used in task 2.
第一届生物信息提取关键评估(BioCreAtIvE)挑战赛的目标是提供一组通用评估任务,以评估应用于生物学问题的文本挖掘技术的当前水平。2004年3月28日至31日在西班牙格拉纳达举办的一次研讨会上展示了相关结果。收录在这本BMC生物信息学增刊《分子生物学中文本挖掘方法的关键评估》中的文章描述了BioCreAtIvE挑战赛的任务、系统、结果及其独立评估。
BioCreAtIvE挑战赛聚焦于两项任务。第一项任务涉及从文本中提取基因或蛋白质名称,并将它们映射到三个模式生物数据库(果蝇、小鼠、酵母)的标准化基因标识符。第二项任务解决功能注释问题,要求系统在给定全文文章的情况下,识别支持特定蛋白质的基因本体注释的特定文本段落。
第一届BioCreAtIvE评估吸引了高水平的国际参与(来自10个国家的27个团队)。该评估为一项基础任务(基因名称查找与标准化)提供了当前的最佳性能结果,其中最佳系统实现了80%的平衡精确率/召回率或更高,这可能使其适用于生物学中的实际应用。高级任务(从自由文本中进行功能注释)的结果则低得多,这表明在需要知识外推和解释的文本挖掘方法中存在当前的局限性。此外,BioCreAtIvE的一项重要贡献是为这两项任务创建并发布了训练和测试数据集。本期特刊有22篇文章,其中6篇对数据集的结果或数据质量进行了分析,包括对任务2中使用的测试集进行的新颖的注释者间一致性评估。