• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

生物医学实体识别网络服务器的下一代社区评估:BeCalm的指标、性能及互操作性方面

Next generation community assessment of biomedical entity recognition web servers: metrics, performance, interoperability aspects of BeCalm.

作者信息

Pérez-Pérez Martin, Pérez-Rodríguez Gael, Blanco-Míguez Aitor, Fdez-Riverola Florentino, Valencia Alfonso, Krallinger Martin, Lourenço Anália

机构信息

Department of Computer Science, ESEI, University of Vigo, Campus As Lagoas, 32004, Ourense, Spain.

The Biomedical Research Centre (CINBIO), Campus Universitario Lagoas-Marcosende, 36310, Vigo, Spain.

出版信息

J Cheminform. 2019 Jun 24;11(1):42. doi: 10.1186/s13321-019-0363-6.

DOI:10.1186/s13321-019-0363-6
PMID:31236786
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6591930/
Abstract

BACKGROUND

Shared tasks and community challenges represent key instruments to promote research, collaboration and determine the state of the art of biomedical and chemical text mining technologies. Traditionally, such tasks relied on the comparison of automatically generated results against a so-called Gold Standard dataset of manually labelled textual data, regardless of efficiency and robustness of the underlying implementations. Due to the rapid growth of unstructured data collections, including patent databases and particularly the scientific literature, there is a pressing need to generate, assess and expose robust big data text mining solutions to semantically enrich documents in real time. To address this pressing need, a novel track called "Technical interoperability and performance of annotation servers" was launched under the umbrella of the BioCreative text mining evaluation effort. The aim of this track was to enable the continuous assessment of technical aspects of text annotation web servers, specifically of online biomedical named entity recognition systems of interest for medicinal chemistry applications.

RESULTS

A total of 15 out of 26 registered teams successfully implemented online annotation servers. They returned predictions during a two-month period in predefined formats and were evaluated through the BeCalm evaluation platform, specifically developed for this track. The track encompassed three levels of evaluation, i.e. data format considerations, technical metrics and functional specifications. Participating annotation servers were implemented in seven different programming languages and covered 12 general entity types. The continuous evaluation of server responses accounted for testing periods of low activity and moderate to high activity, encompassing overall 4,092,502 requests from three different document provider settings. The median response time was below 3.74 s, with a median of 10 annotations/document. Most of the servers showed great reliability and stability, being able to process over 100,000 requests in a 5-day period.

CONCLUSIONS

The presented track was a novel experimental task that systematically evaluated the technical performance aspects of online entity recognition systems. It raised the interest of a significant number of participants. Future editions of the competition will address the ability to process documents in bulk as well as to annotate full-text documents.

摘要

背景

共享任务和社区挑战是促进研究、合作以及确定生物医学和化学文本挖掘技术发展水平的关键手段。传统上,此类任务依赖于将自动生成的结果与手动标注文本数据的所谓“金标准数据集”进行比较,而不考虑底层实现的效率和稳健性。由于包括专利数据库尤其是科学文献在内的非结构化数据集合的快速增长,迫切需要生成、评估并展示强大的大数据文本挖掘解决方案,以便实时对文档进行语义丰富。为满足这一迫切需求,在生物创意文本挖掘评估工作的框架下启动了一个名为“注释服务器的技术互操作性和性能”的新赛道。该赛道的目的是能够持续评估文本注释网络服务器的技术方面,特别是对药物化学应用感兴趣的在线生物医学命名实体识别系统。

结果

26个注册团队中有15个成功实现了在线注释服务器。它们在两个月的时间内以预定义格式返回预测结果,并通过专门为此赛道开发的BeCalm评估平台进行评估。该赛道涵盖三个评估级别,即数据格式考量、技术指标和功能规格。参与的注释服务器用七种不同的编程语言实现,涵盖12种通用实体类型。对服务器响应的持续评估考虑了低活动期以及中等到高活动期的测试阶段,总共包括来自三种不同文档提供设置的4,092,502个请求。中位响应时间低于3.74秒,每份文档的注释中位数为10条。大多数服务器显示出很高的可靠性和稳定性,能够在5天内处理超过100,000个请求。

结论

所展示的赛道是一项新颖的实验任务,系统地评估了在线实体识别系统的技术性能方面。它引起了众多参与者的兴趣。未来的竞赛版本将解决批量处理文档以及注释全文文档的能力问题。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/689b/6591930/62eba6715d8c/13321_2019_363_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/689b/6591930/5385c62dfdf8/13321_2019_363_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/689b/6591930/aab40a58fd94/13321_2019_363_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/689b/6591930/ae9096663d5e/13321_2019_363_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/689b/6591930/3e9b20e3a3ef/13321_2019_363_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/689b/6591930/d2c8793a892e/13321_2019_363_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/689b/6591930/62eba6715d8c/13321_2019_363_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/689b/6591930/5385c62dfdf8/13321_2019_363_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/689b/6591930/aab40a58fd94/13321_2019_363_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/689b/6591930/ae9096663d5e/13321_2019_363_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/689b/6591930/3e9b20e3a3ef/13321_2019_363_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/689b/6591930/d2c8793a892e/13321_2019_363_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/689b/6591930/62eba6715d8c/13321_2019_363_Fig6_HTML.jpg

相似文献

1
Next generation community assessment of biomedical entity recognition web servers: metrics, performance, interoperability aspects of BeCalm.生物医学实体识别网络服务器的下一代社区评估:BeCalm的指标、性能及互操作性方面
J Cheminform. 2019 Jun 24;11(1):42. doi: 10.1186/s13321-019-0363-6.
2
Design, implementation, and operation of a rapid, robust named entity recognition web service.一个快速、强大的命名实体识别网络服务的设计、实现与运营。
J Cheminform. 2019 Mar 8;11(1):19. doi: 10.1186/s13321-019-0344-9.
3
SIA: a scalable interoperable annotation server for biomedical named entities.SIA:一个用于生物医学命名实体的可扩展的可互操作注释服务器。
J Cheminform. 2018 Dec 14;10(1):63. doi: 10.1186/s13321-018-0319-2.
4
MER: a shell script and annotation server for minimal named entity recognition and linking.MER:用于最小命名实体识别与链接的 shell 脚本及注释服务器。
J Cheminform. 2018 Dec 5;10(1):58. doi: 10.1186/s13321-018-0312-9.
5
Overview of the BioCreative III Workshop.第三届生物创意研讨会概述。
BMC Bioinformatics. 2011 Oct 3;12 Suppl 8(Suppl 8):S1. doi: 10.1186/1471-2105-12-S8-S1.
6
Configurable web-services for biomedical document annotation.用于生物医学文档注释的可配置网络服务。
J Cheminform. 2018 Dec 21;10(1):68. doi: 10.1186/s13321-018-0317-4.
7
Evaluation of BioCreAtIvE assessment of task 2.生物创意任务2评估的评价
BMC Bioinformatics. 2005;6 Suppl 1(Suppl 1):S16. doi: 10.1186/1471-2105-6-S1-S16. Epub 2005 May 24.
8
CHEMDNER: The drugs and chemical names extraction challenge.CHEMDNER:药物和化学名称提取挑战赛。
J Cheminform. 2015 Jan 19;7(Suppl 1 Text mining for chemistry and the CHEMDNER track):S1. doi: 10.1186/1758-2946-7-S1-S1. eCollection 2015.
9
Assessing the state of the art in biomedical relation extraction: overview of the BioCreative V chemical-disease relation (CDR) task.评估生物医学关系抽取的技术现状:生物创意V化学-疾病关系(CDR)任务概述。
Database (Oxford). 2016 Mar 19;2016. doi: 10.1093/database/baw032. Print 2016.
10
Overview of the BioCreative VI Precision Medicine Track: mining protein interactions and mutations for precision medicine.BioCreative VI 精准医学赛道概述:精准医学中的蛋白质相互作用和突变挖掘。
Database (Oxford). 2019 Jan 1;2019:bay147. doi: 10.1093/database/bay147.

引用本文的文献

1
Configurable web-services for biomedical document annotation.用于生物医学文档注释的可配置网络服务。
J Cheminform. 2018 Dec 21;10(1):68. doi: 10.1186/s13321-018-0317-4.
2
A neural network approach to chemical and gene/protein entity recognition in patents.一种用于专利中化学及基因/蛋白质实体识别的神经网络方法。
J Cheminform. 2018 Dec 18;10(1):65. doi: 10.1186/s13321-018-0318-3.
3
Statistical principle-based approach for gene and protein related object recognition.基于统计原理的基因和蛋白质相关对象识别方法。

本文引用的文献

1
Design, implementation, and operation of a rapid, robust named entity recognition web service.一个快速、强大的命名实体识别网络服务的设计、实现与运营。
J Cheminform. 2019 Mar 8;11(1):19. doi: 10.1186/s13321-019-0344-9.
2
OGER++: hybrid multi-type entity recognition.OGER++:混合多类型实体识别
J Cheminform. 2019 Jan 21;11(1):7. doi: 10.1186/s13321-018-0326-3.
3
Configurable web-services for biomedical document annotation.用于生物医学文档注释的可配置网络服务。
J Cheminform. 2018 Dec 17;10(1):64. doi: 10.1186/s13321-018-0314-7.
4
MER: a shell script and annotation server for minimal named entity recognition and linking.MER:用于最小命名实体识别与链接的 shell 脚本及注释服务器。
J Cheminform. 2018 Dec 5;10(1):58. doi: 10.1186/s13321-018-0312-9.
J Cheminform. 2018 Dec 21;10(1):68. doi: 10.1186/s13321-018-0317-4.
4
SIA: a scalable interoperable annotation server for biomedical named entities.SIA:一个用于生物医学命名实体的可扩展的可互操作注释服务器。
J Cheminform. 2018 Dec 14;10(1):63. doi: 10.1186/s13321-018-0319-2.
5
MER: a shell script and annotation server for minimal named entity recognition and linking.MER:用于最小命名实体识别与链接的 shell 脚本及注释服务器。
J Cheminform. 2018 Dec 5;10(1):58. doi: 10.1186/s13321-018-0312-9.
6
The Cellosaurus, a Cell-Line Knowledge Resource.细胞osaurus,一个细胞系知识资源库。
J Biomol Tech. 2018 Jul;29(2):25-38. doi: 10.7171/jbt.18-2902-002. Epub 2018 May 10.
7
Information Retrieval and Text Mining Technologies for Chemistry.化学信息检索与文本挖掘技术。
Chem Rev. 2017 Jun 28;117(12):7673-7761. doi: 10.1021/acs.chemrev.6b00851. Epub 2017 May 5.
8
BioCreative V BioC track overview: collaborative biocurator assistant task for BioGRID.生物创意V生物C轨迹概述:生物网格的协作生物编目员助手任务。
Database (Oxford). 2016 Sep 1;2016. doi: 10.1093/database/baw121. Print 2016.
9
Crowdsourcing and curation: perspectives from biology and natural language processing.众包与精选:来自生物学和自然语言处理的视角
Database (Oxford). 2016 Aug 7;2016. doi: 10.1093/database/baw115. Print 2016.
10
Assessing the state of the art in biomedical relation extraction: overview of the BioCreative V chemical-disease relation (CDR) task.评估生物医学关系抽取的技术现状:生物创意V化学-疾病关系(CDR)任务概述。
Database (Oxford). 2016 Mar 19;2016. doi: 10.1093/database/baw032. Print 2016.