• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于网络服务的文本挖掘对互操作性和流程简化具有广泛影响。

Web services-based text-mining demonstrates broad impacts for interoperability and process simplification.

作者信息

Wiegers Thomas C, Davis Allan Peter, Mattingly Carolyn J

机构信息

Department of Biological Sciences, North Carolina State University, 139 David Clark Lab, Campus Box 7617, Raleigh, NC 27695-7617, USA

Department of Biological Sciences, North Carolina State University, 139 David Clark Lab, Campus Box 7617, Raleigh, NC 27695-7617, USA.

出版信息

Database (Oxford). 2014 Jun 10;2014. doi: 10.1093/database/bau050. Print 2014.

DOI:10.1093/database/bau050
PMID:24919658
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4207221/
Abstract

The Critical Assessment of Information Extraction systems in Biology (BioCreAtIvE) challenge evaluation tasks collectively represent a community-wide effort to evaluate a variety of text-mining and information extraction systems applied to the biological domain. The BioCreative IV Workshop included five independent subject areas, including Track 3, which focused on named-entity recognition (NER) for the Comparative Toxicogenomics Database (CTD; http://ctdbase.org). Previously, CTD had organized document ranking and NER-related tasks for the BioCreative Workshop 2012; a key finding of that effort was that interoperability and integration complexity were major impediments to the direct application of the systems to CTD's text-mining pipeline. This underscored a prevailing problem with software integration efforts. Major interoperability-related issues included lack of process modularity, operating system incompatibility, tool configuration complexity and lack of standardization of high-level inter-process communications. One approach to potentially mitigate interoperability and general integration issues is the use of Web services to abstract implementation details; rather than integrating NER tools directly, HTTP-based calls from CTD's asynchronous, batch-oriented text-mining pipeline could be made to remote NER Web services for recognition of specific biological terms using BioC (an emerging family of XML formats) for inter-process communications. To test this concept, participating groups developed Representational State Transfer /BioC-compliant Web services tailored to CTD's NER requirements. Participants were provided with a comprehensive set of training materials. CTD evaluated results obtained from the remote Web service-based URLs against a test data set of 510 manually curated scientific articles. Twelve groups participated in the challenge. Recall, precision, balanced F-scores and response times were calculated. Top balanced F-scores for gene, chemical and disease NER were 61, 74 and 51%, respectively. Response times ranged from fractions-of-a-second to over a minute per article. We present a description of the challenge and summary of results, demonstrating how curation groups can effectively use interoperable NER technologies to simplify text-mining pipeline implementation. Database URL: http://ctdbase.org/

摘要

生物学信息提取系统的关键评估(BioCreAtIvE)挑战评估任务共同代表了一项全社区范围的努力,旨在评估应用于生物领域的各种文本挖掘和信息提取系统。BioCreative IV研讨会包括五个独立的主题领域,其中第3赛道专注于比较毒理基因组学数据库(CTD;http://ctdbase.org)的命名实体识别(NER)。此前,CTD曾为2012年BioCreative研讨会组织过文档排名和与NER相关的任务;该项工作的一个关键发现是,互操作性和集成复杂性是将这些系统直接应用于CTD文本挖掘流程的主要障碍。这凸显了软件集成工作中一个普遍存在的问题。与互操作性相关的主要问题包括缺乏流程模块化、操作系统不兼容、工具配置复杂以及高层进程间通信缺乏标准化。一种可能减轻互操作性和一般集成问题的方法是使用Web服务来抽象实现细节;不是直接集成NER工具,而是从CTD的异步、面向批处理的文本挖掘流程进行基于HTTP的调用,以调用远程NER Web服务,使用BioC(一种新兴的XML格式家族)进行进程间通信来识别特定的生物术语。为了测试这一概念,参与小组开发了符合代表性状态转移/BioC标准的Web服务,以满足CTD的NER要求。为参与者提供了一套全面的培训材料。CTD根据一个由510篇人工整理的科学文章组成的测试数据集,评估了从基于远程Web服务的URL获得的结果。十二个小组参与了此次挑战。计算了召回率、精确率、平衡F分数和响应时间。基因、化学物质和疾病NER的最高平衡F分数分别为61%、74%和51%。响应时间从每篇文章几分之一秒到超过一分钟不等。我们描述了此次挑战并总结了结果,展示了编目小组如何有效地使用可互操作的NER技术来简化文本挖掘流程的实现。数据库网址:http://ctdbase.org/

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e314/4207221/195c5073aa5d/bau050f11p.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e314/4207221/b47e5ae4883b/bau050f1p.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e314/4207221/56068dc8b1e9/bau050f2p.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e314/4207221/e8383e071483/bau050f3p.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e314/4207221/313ee699966a/bau050f4p.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e314/4207221/d6e578a57143/bau050f5p.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e314/4207221/1e7a23ab8faf/bau050f6p.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e314/4207221/61eac60fefa8/bau050f7p.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e314/4207221/3765f70f6f17/bau050f8p.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e314/4207221/df82366826bd/bau050f9p.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e314/4207221/6e48aa5ee86c/bau050f10p.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e314/4207221/195c5073aa5d/bau050f11p.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e314/4207221/b47e5ae4883b/bau050f1p.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e314/4207221/56068dc8b1e9/bau050f2p.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e314/4207221/e8383e071483/bau050f3p.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e314/4207221/313ee699966a/bau050f4p.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e314/4207221/d6e578a57143/bau050f5p.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e314/4207221/1e7a23ab8faf/bau050f6p.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e314/4207221/61eac60fefa8/bau050f7p.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e314/4207221/3765f70f6f17/bau050f8p.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e314/4207221/df82366826bd/bau050f9p.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e314/4207221/6e48aa5ee86c/bau050f10p.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e314/4207221/195c5073aa5d/bau050f11p.jpg

相似文献

1
Web services-based text-mining demonstrates broad impacts for interoperability and process simplification.基于网络服务的文本挖掘对互操作性和流程简化具有广泛影响。
Database (Oxford). 2014 Jun 10;2014. doi: 10.1093/database/bau050. Print 2014.
2
Collaborative biocuration--text-mining development task for document prioritization for curation.协作生物注释——用于文档优先级排序的文本挖掘开发任务,以便进行注释。
Database (Oxford). 2012 Nov 22;2012:bas037. doi: 10.1093/database/bas037. Print 2012.
3
tmBioC: improving interoperability of text-mining tools with BioC.tmBioC:提高文本挖掘工具与BioC的互操作性。
Database (Oxford). 2014 Jul 25;2014. doi: 10.1093/database/bau073. Print 2014.
4
Text mining effectively scores and ranks the literature for improving chemical-gene-disease curation at the comparative toxicogenomics database.文本挖掘有效地对文献进行评分和排序,以提高比较毒理学基因组学数据库中的化学物质-基因-疾病的编纂工作。
PLoS One. 2013 Apr 17;8(4):e58201. doi: 10.1371/journal.pone.0058201. Print 2013.
5
BioCreative V CDR task corpus: a resource for chemical disease relation extraction.生物创意V化学疾病关系提取任务语料库:化学疾病关系提取的资源。
Database (Oxford). 2016 May 9;2016. doi: 10.1093/database/baw068. Print 2016.
6
Mining chemical patents with an ensemble of open systems.利用开放系统集成挖掘化学专利。
Database (Oxford). 2016 May 12;2016. doi: 10.1093/database/baw065. Print 2016.
7
BioC viewer: a web-based tool for displaying and merging annotations in BioC.BioC查看器:一种用于在BioC中显示和合并注释的基于网络的工具。
Database (Oxford). 2016 Aug 10;2016. doi: 10.1093/database/baw106. Print 2016.
8
BioCreative III interactive task: an overview.BioCreative III 交互式任务概述。
BMC Bioinformatics. 2011 Oct 3;12 Suppl 8(Suppl 8):S4. doi: 10.1186/1471-2105-12-S8-S4.
9
Using binary classification to prioritize and curate articles for the Comparative Toxicogenomics Database.使用二进制分类对比较毒理学基因组学数据库中的文章进行优先级排序和精选。
Database (Oxford). 2012 Dec 5;2012:bas050. doi: 10.1093/database/bas050. Print 2012.
10
A CTD-Pfizer collaboration: manual curation of 88,000 scientific articles text mined for drug-disease and drug-phenotype interactions.CTD-Pfizer 合作项目:对 88000 篇经文本挖掘的科学文章进行人工注释,以发现药物-疾病和药物-表型相互作用。
Database (Oxford). 2013 Nov 28;2013:bat080. doi: 10.1093/database/bat080. Print 2013.

引用本文的文献

1
Integrating AI-powered text mining from PubTator into the manual curation workflow at the Comparative Toxicogenomics Database.将来自PubTator的人工智能文本挖掘技术整合到比较毒理基因组学数据库的人工编目工作流程中。
Database (Oxford). 2025 Feb 21;2025. doi: 10.1093/database/baaf013.
2
TopicTracker - An advanced software pipeline for text mining on PubMed data: Bridging the gap between off-the-shelf tools and code based approaches.主题追踪器 - 一种用于PubMed数据文本挖掘的先进软件管道:弥合现成工具与基于代码的方法之间的差距。
Heliyon. 2024 Aug 15;10(17):e36351. doi: 10.1016/j.heliyon.2024.e36351. eCollection 2024 Sep 15.
3

本文引用的文献

1
Ontology-based content analysis of US patent applications from 2001-2010.基于本体的2001年至2010年美国专利申请内容分析。
Pharm Pat Anal. 2013 Jan;2(1):39-54. doi: 10.4155/ppa.12.76.
2
BioC: a minimalist approach to interoperability for biomedical text processing.BioC:一种用于生物医学文本处理的最小互操作方法。
Database (Oxford). 2013 Sep 18;2013:bat064. doi: 10.1093/database/bat064. Print 2013.
3
Managing the data deluge: data-driven GO category assignment improves while complexity of functional annotation increases.管理数据洪流:数据驱动的 GO 类别分配改进,而功能注释的复杂性增加。
Comparative Toxicogenomics Database (CTD): update 2021.
比较毒理学基因组学数据库(CTD):2021 年更新。
Nucleic Acids Res. 2021 Jan 8;49(D1):D1138-D1143. doi: 10.1093/nar/gkaa891.
4
Next generation community assessment of biomedical entity recognition web servers: metrics, performance, interoperability aspects of BeCalm.生物医学实体识别网络服务器的下一代社区评估:BeCalm的指标、性能及互操作性方面
J Cheminform. 2019 Jun 24;11(1):42. doi: 10.1186/s13321-019-0363-6.
5
Navigating the disease landscape: knowledge representations for contextualizing molecular signatures.疾病图谱导航:分子特征语境化的知识表示。
Brief Bioinform. 2019 Mar 25;20(2):609-623. doi: 10.1093/bib/bby025.
6
Pressing needs of biomedical text mining in biocuration and beyond: opportunities and challenges.生物编目及其他领域对生物医学文本挖掘的迫切需求:机遇与挑战。
Database (Oxford). 2016 Dec 26;2016. doi: 10.1093/database/baw161. Print 2016.
7
The Comparative Toxicogenomics Database: update 2017.比较毒理基因组学数据库:2017年更新版
Nucleic Acids Res. 2017 Jan 4;45(D1):D972-D978. doi: 10.1093/nar/gkw838. Epub 2016 Sep 19.
8
Overview of the interactive task in BioCreative V.生物创意V中交互式任务概述。
Database (Oxford). 2016 Sep 1;2016. doi: 10.1093/database/baw119. Print 2016.
9
BioCreative V CDR task corpus: a resource for chemical disease relation extraction.生物创意V化学疾病关系提取任务语料库:化学疾病关系提取的资源。
Database (Oxford). 2016 May 9;2016. doi: 10.1093/database/baw068. Print 2016.
10
Chemical-induced disease relation extraction with various linguistic features.基于多种语言特征的化学诱导疾病关系提取
Database (Oxford). 2016 Apr 6;2016. doi: 10.1093/database/baw042. Print 2016.
Database (Oxford). 2013 Jul 9;2013:bat041. doi: 10.1093/database/bat041. Print 2013.
4
PubTator: a web-based text mining tool for assisting biocuration.PubTator:一个用于辅助生物注释的基于网络的文本挖掘工具。
Nucleic Acids Res. 2013 Jul;41(Web Server issue):W518-22. doi: 10.1093/nar/gkt441. Epub 2013 May 22.
5
Text mining effectively scores and ranks the literature for improving chemical-gene-disease curation at the comparative toxicogenomics database.文本挖掘有效地对文献进行评分和排序,以提高比较毒理学基因组学数据库中的化学物质-基因-疾病的编纂工作。
PLoS One. 2013 Apr 17;8(4):e58201. doi: 10.1371/journal.pone.0058201. Print 2013.
6
Using the OntoGene pipeline for the triage task of BioCreative 2012.使用 OntoGene 流水线进行 BioCreative 2012 的分诊任务。
Database (Oxford). 2013 Feb 9;2013:bas053. doi: 10.1093/database/bas053. Print 2013.
7
Collaborative biocuration--text-mining development task for document prioritization for curation.协作生物注释——用于文档优先级排序的文本挖掘开发任务,以便进行注释。
Database (Oxford). 2012 Nov 22;2012:bas037. doi: 10.1093/database/bas037. Print 2012.
8
The Comparative Toxicogenomics Database: update 2013.比较毒理学基因组学数据库:2013 年更新。
Nucleic Acids Res. 2013 Jan;41(Database issue):D1104-14. doi: 10.1093/nar/gks994. Epub 2012 Oct 23.
9
Text mining for the biocuration workflow.文本挖掘在生物注释工作流中的应用。
Database (Oxford). 2012 Apr 18;2012:bas020. doi: 10.1093/database/bas020. Print 2012.
10
ChemSpot: a hybrid system for chemical named entity recognition.ChemSpot:一种用于化学命名实体识别的混合系统。
Bioinformatics. 2012 Jun 15;28(12):1633-40. doi: 10.1093/bioinformatics/bts183. Epub 2012 Apr 12.