基于网络服务的文本挖掘对互操作性和流程简化具有广泛影响。

Web services-based text-mining demonstrates broad impacts for interoperability and process simplification.

作者信息

Wiegers Thomas C, Davis Allan Peter, Mattingly Carolyn J

机构信息

Department of Biological Sciences, North Carolina State University, 139 David Clark Lab, Campus Box 7617, Raleigh, NC 27695-7617, USA

Department of Biological Sciences, North Carolina State University, 139 David Clark Lab, Campus Box 7617, Raleigh, NC 27695-7617, USA.

出版信息

Database (Oxford). 2014 Jun 10;2014. doi: 10.1093/database/bau050. Print 2014.

DOI:10.1093/database/bau050

PMID:24919658

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4207221/

Abstract

The Critical Assessment of Information Extraction systems in Biology (BioCreAtIvE) challenge evaluation tasks collectively represent a community-wide effort to evaluate a variety of text-mining and information extraction systems applied to the biological domain. The BioCreative IV Workshop included five independent subject areas, including Track 3, which focused on named-entity recognition (NER) for the Comparative Toxicogenomics Database (CTD; http://ctdbase.org). Previously, CTD had organized document ranking and NER-related tasks for the BioCreative Workshop 2012; a key finding of that effort was that interoperability and integration complexity were major impediments to the direct application of the systems to CTD's text-mining pipeline. This underscored a prevailing problem with software integration efforts. Major interoperability-related issues included lack of process modularity, operating system incompatibility, tool configuration complexity and lack of standardization of high-level inter-process communications. One approach to potentially mitigate interoperability and general integration issues is the use of Web services to abstract implementation details; rather than integrating NER tools directly, HTTP-based calls from CTD's asynchronous, batch-oriented text-mining pipeline could be made to remote NER Web services for recognition of specific biological terms using BioC (an emerging family of XML formats) for inter-process communications. To test this concept, participating groups developed Representational State Transfer /BioC-compliant Web services tailored to CTD's NER requirements. Participants were provided with a comprehensive set of training materials. CTD evaluated results obtained from the remote Web service-based URLs against a test data set of 510 manually curated scientific articles. Twelve groups participated in the challenge. Recall, precision, balanced F-scores and response times were calculated. Top balanced F-scores for gene, chemical and disease NER were 61, 74 and 51%, respectively. Response times ranged from fractions-of-a-second to over a minute per article. We present a description of the challenge and summary of results, demonstrating how curation groups can effectively use interoperable NER technologies to simplify text-mining pipeline implementation. Database URL: http://ctdbase.org/

摘要

生物学信息提取系统的关键评估（BioCreAtIvE）挑战评估任务共同代表了一项全社区范围的努力，旨在评估应用于生物领域的各种文本挖掘和信息提取系统。BioCreative IV研讨会包括五个独立的主题领域，其中第3赛道专注于比较毒理基因组学数据库（CTD；http://ctdbase.org）的命名实体识别（NER）。此前，CTD曾为2012年BioCreative研讨会组织过文档排名和与NER相关的任务；该项工作的一个关键发现是，互操作性和集成复杂性是将这些系统直接应用于CTD文本挖掘流程的主要障碍。这凸显了软件集成工作中一个普遍存在的问题。与互操作性相关的主要问题包括缺乏流程模块化、操作系统不兼容、工具配置复杂以及高层进程间通信缺乏标准化。一种可能减轻互操作性和一般集成问题的方法是使用Web服务来抽象实现细节；不是直接集成NER工具，而是从CTD的异步、面向批处理的文本挖掘流程进行基于HTTP的调用，以调用远程NER Web服务，使用BioC（一种新兴的XML格式家族）进行进程间通信来识别特定的生物术语。为了测试这一概念，参与小组开发了符合代表性状态转移/BioC标准的Web服务，以满足CTD的NER要求。为参与者提供了一套全面的培训材料。CTD根据一个由510篇人工整理的科学文章组成的测试数据集，评估了从基于远程Web服务的URL获得的结果。十二个小组参与了此次挑战。计算了召回率、精确率、平衡F分数和响应时间。基因、化学物质和疾病NER的最高平衡F分数分别为61%、74%和51%。响应时间从每篇文章几分之一秒到超过一分钟不等。我们描述了此次挑战并总结了结果，展示了编目小组如何有效地使用可互操作的NER技术来简化文本挖掘流程的实现。数据库网址：http://ctdbase.org/

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

基于网络服务的文本挖掘对互操作性和流程简化具有广泛影响。

Web services-based text-mining demonstrates broad impacts for interoperability and process simplification.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

基于网络服务的文本挖掘对互操作性和流程简化具有广泛影响。

Web services-based text-mining demonstrates broad impacts for interoperability and process simplification.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献