Rebholz-Schuhmann Dietrich, Kafkas Senay, Kim Jee-Hyub, Jimeno Yepes Antonio, Lewin Ian
Department of Computational Linguistics, University of Zurich, Zürich, Switzerland.
J Biomed Semantics. 2013 Sep 13;4(1):19. doi: 10.1186/2041-1480-4-19.
Named entity recognition (NER) is an essential step in automatic text processing pipelines. A number of solutions have been presented and evaluated against gold standard corpora (GSC). The benchmarking against GSCs is crucial, but left to the individual researcher. Herewith we present a League Table web site, which benchmarks NER solutions against selected public GSCs, maintains a ranked list and archives the annotated corpus for future comparisons.
The web site enables access to the different GSCs in a standardized format (IeXML). Upon submission of the annotated corpus the user has to describe the specification of the used solution and then uploads the annotated corpus for evaluation. The performance of the system is measured against one or more GSCs and the results are then added to the web site ("League Table"). It displays currently the results from publicly available NER solutions from the Whatizit infrastructure for future comparisons.
The League Table enables the evaluation of NER solutions in a standardized infrastructure and monitors the results long-term. For access please go to http://wwwdev.ebi.ac.uk/Rebholz-srv/calbc/assessmentGSC/.
命名实体识别(NER)是自动文本处理流程中的关键步骤。已有多种解决方案被提出,并针对金标准语料库(GSC)进行了评估。与GSC进行基准测试至关重要,但这一工作由各个研究人员自行完成。在此,我们推出了一个排行榜网站,该网站针对选定的公共GSC对NER解决方案进行基准测试,维护一个排名列表,并存档注释语料库以供未来比较。
该网站支持以标准化格式(IeXML)访问不同的GSC。提交注释语料库时,用户必须描述所用解决方案的规格,然后上传注释语料库进行评估。系统性能根据一个或多个GSC进行衡量,结果随后添加到网站(“排行榜”)。它目前显示来自Whatizit基础设施的公开可用NER解决方案的结果,以供未来比较。
排行榜能够在标准化基础设施中评估NER解决方案,并长期监测结果。如需访问,请前往http://wwwdev.ebi.ac.uk/Rebholz-srv/calbc/assessmentGSC/。