• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

监测命名实体识别:排行榜。

Monitoring named entity recognition: the League Table.

作者信息

Rebholz-Schuhmann Dietrich, Kafkas Senay, Kim Jee-Hyub, Jimeno Yepes Antonio, Lewin Ian

机构信息

Department of Computational Linguistics, University of Zurich, Zürich, Switzerland.

出版信息

J Biomed Semantics. 2013 Sep 13;4(1):19. doi: 10.1186/2041-1480-4-19.

DOI:10.1186/2041-1480-4-19
PMID:24034148
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4015903/
Abstract

BACKGROUND

Named entity recognition (NER) is an essential step in automatic text processing pipelines. A number of solutions have been presented and evaluated against gold standard corpora (GSC). The benchmarking against GSCs is crucial, but left to the individual researcher. Herewith we present a League Table web site, which benchmarks NER solutions against selected public GSCs, maintains a ranked list and archives the annotated corpus for future comparisons.

RESULTS

The web site enables access to the different GSCs in a standardized format (IeXML). Upon submission of the annotated corpus the user has to describe the specification of the used solution and then uploads the annotated corpus for evaluation. The performance of the system is measured against one or more GSCs and the results are then added to the web site ("League Table"). It displays currently the results from publicly available NER solutions from the Whatizit infrastructure for future comparisons.

CONCLUSION

The League Table enables the evaluation of NER solutions in a standardized infrastructure and monitors the results long-term. For access please go to http://wwwdev.ebi.ac.uk/Rebholz-srv/calbc/assessmentGSC/.

CONTACT

rebholz@ifi.uzh.ch.

摘要

背景

命名实体识别(NER)是自动文本处理流程中的关键步骤。已有多种解决方案被提出,并针对金标准语料库(GSC)进行了评估。与GSC进行基准测试至关重要,但这一工作由各个研究人员自行完成。在此,我们推出了一个排行榜网站,该网站针对选定的公共GSC对NER解决方案进行基准测试,维护一个排名列表,并存档注释语料库以供未来比较。

结果

该网站支持以标准化格式(IeXML)访问不同的GSC。提交注释语料库时,用户必须描述所用解决方案的规格,然后上传注释语料库进行评估。系统性能根据一个或多个GSC进行衡量,结果随后添加到网站(“排行榜”)。它目前显示来自Whatizit基础设施的公开可用NER解决方案的结果,以供未来比较。

结论

排行榜能够在标准化基础设施中评估NER解决方案,并长期监测结果。如需访问,请前往http://wwwdev.ebi.ac.uk/Rebholz-srv/calbc/assessmentGSC/。

联系方式

rebholz@ifi.uzh.ch

相似文献

1
Monitoring named entity recognition: the League Table.监测命名实体识别:排行榜。
J Biomed Semantics. 2013 Sep 13;4(1):19. doi: 10.1186/2041-1480-4-19.
2
Assessment of disease named entity recognition on a corpus of annotated sentences.基于带注释句子语料库的疾病命名实体识别评估。
BMC Bioinformatics. 2008 Apr 11;9 Suppl 3(Suppl 3):S3. doi: 10.1186/1471-2105-9-S3-S3.
3
Assessment of NER solutions against the first and second CALBC Silver Standard Corpus.针对首个和第二个CALBC银标准语料库对命名实体识别解决方案进行评估。
J Biomed Semantics. 2011 Oct 6;2 Suppl 5(Suppl 5):S11. doi: 10.1186/2041-1480-2-S5-S11.
4
Harmonization of gene/protein annotations: towards a gold standard MEDLINE.基因/蛋白质注释的协调:迈向 MEDLINE 的黄金标准。
Bioinformatics. 2012 May 1;28(9):1253-61. doi: 10.1093/bioinformatics/bts125. Epub 2012 Mar 13.
5
Boosting drug named entity recognition using an aggregate classifier.使用聚合分类器提升药物命名实体识别
Artif Intell Med. 2015 Oct;65(2):145-53. doi: 10.1016/j.artmed.2015.05.007. Epub 2015 Jun 17.
6
Optimising chemical named entity recognition with pre-processing analytics, knowledge-rich features and heuristics.通过预处理分析、知识丰富的特征和启发式方法优化化学命名实体识别。
J Cheminform. 2015 Jan 19;7(Suppl 1 Text mining for chemistry and the CHEMDNER track):S6. doi: 10.1186/1758-2946-7-S1-S6. eCollection 2015.
7
CALBC silver standard corpus.CALBC银标准语料库。
J Bioinform Comput Biol. 2010 Feb;8(1):163-79. doi: 10.1142/s0219720010004562.
8
Biomedical named entity recognition using deep neural networks with contextual information.基于上下文信息的深度神经网络的生物医学命名实体识别。
BMC Bioinformatics. 2019 Dec 27;20(1):735. doi: 10.1186/s12859-019-3321-4.
9
D3NER: biomedical named entity recognition using CRF-biLSTM improved with fine-tuned embeddings of various linguistic information.D3NER:基于条件随机场-双向长短期记忆网络的生物医学命名实体识别,通过各种语言信息的微调嵌入得到改进。
Bioinformatics. 2018 Oct 15;34(20):3539-3546. doi: 10.1093/bioinformatics/bty356.
10
GENETAG: a tagged corpus for gene/protein named entity recognition.GENETAG:一个用于基因/蛋白质命名实体识别的带标注语料库。
BMC Bioinformatics. 2005;6 Suppl 1(Suppl 1):S3. doi: 10.1186/1471-2105-6-S1-S3. Epub 2005 May 24.

引用本文的文献

1
Combining lexical and context features for automatic ontology extension.基于词汇和上下文特征的本体自动扩展。
J Biomed Semantics. 2020 Jan 13;11(1):1. doi: 10.1186/s13326-019-0218-0.
2
Benchmarking infrastructure for mutation text mining.用于突变文本挖掘的基准测试基础设施。
J Biomed Semantics. 2014 Feb 25;5(1):11. doi: 10.1186/2041-1480-5-11.
3
Evaluating gold standard corpora against gene/protein tagging solutions and lexical resources.根据基因/蛋白质标记解决方案和词汇资源评估金标准语料库。
J Biomed Semantics. 2013 Oct 11;4(1):28. doi: 10.1186/2041-1480-4-28.

本文引用的文献

1
Evaluating gold standard corpora against gene/protein tagging solutions and lexical resources.根据基因/蛋白质标记解决方案和词汇资源评估金标准语料库。
J Biomed Semantics. 2013 Oct 11;4(1):28. doi: 10.1186/2041-1480-4-28.
2
Text-mining solutions for biomedical research: enabling integrative biology.文本挖掘在生物医学研究中的应用:实现综合生物学。
Nat Rev Genet. 2012 Dec;13(12):829-39. doi: 10.1038/nrg3337. Epub 2012 Nov 14.
3
Assessment of NER solutions against the first and second CALBC Silver Standard Corpus.针对首个和第二个CALBC银标准语料库对命名实体识别解决方案进行评估。
J Biomed Semantics. 2011 Oct 6;2 Suppl 5(Suppl 5):S11. doi: 10.1186/2041-1480-2-S5-S11.
4
Critical assessment of methods of protein structure prediction (CASP)--round IX.蛋白质结构预测方法的关键评估(CASP)——第九轮。
Proteins. 2011;79 Suppl 10(0 10):1-5. doi: 10.1002/prot.23200. Epub 2011 Oct 14.
5
Crowdsourcing network inference: the DREAM predictive signaling network challenge.众包网络推断:DREAM 预测信号网络挑战。
Sci Signal. 2011 Aug 30;4(189):mr7. doi: 10.1126/scisignal.2002212.
6
U-Compare: share and compare text mining tools with UIMA.U-Compare:与 UIMA 共享和比较文本挖掘工具。
Bioinformatics. 2009 Aug 1;25(15):1997-8. doi: 10.1093/bioinformatics/btp289. Epub 2009 May 4.
7
Overview of BioCreative II gene mention recognition.生物创意II基因提及识别概述。
Genome Biol. 2008;9 Suppl 2(Suppl 2):S2. doi: 10.1186/gb-2008-9-s2-s2. Epub 2008 Sep 1.
8
Distributed modules for text annotation and IE applied to the biomedical domain.应用于生物医学领域的文本注释和信息提取的分布式模块。
Int J Med Inform. 2006 Jun;75(6):496-500. doi: 10.1016/j.ijmedinf.2005.06.011. Epub 2005 Aug 8.
9
GENETAG: a tagged corpus for gene/protein named entity recognition.GENETAG:一个用于基因/蛋白质命名实体识别的带标注语料库。
BMC Bioinformatics. 2005;6 Suppl 1(Suppl 1):S3. doi: 10.1186/1471-2105-6-S1-S3. Epub 2005 May 24.
10
Overview of BioCreAtIvE: critical assessment of information extraction for biology.生物创意(BioCreAtIvE)概述:生物学信息提取的批判性评估
BMC Bioinformatics. 2005;6 Suppl 1(Suppl 1):S1. doi: 10.1186/1471-2105-6-S1-S1. Epub 2005 May 24.