Suppr超能文献

为更好地获取疾病暴发报告而进行的信息提取。

Information extraction for enhanced access to disease outbreak reports.

作者信息

Grishman Ralph, Huttunen Silja, Yangarber Roman

机构信息

Computer Science Department, Courant Institute of Mathematical Sciences, New York University, New York, NY 10003-6806, USA.

出版信息

J Biomed Inform. 2002 Aug;35(4):236-46. doi: 10.1016/s1532-0464(03)00013-3.

Abstract

Document search is generally based on individual terms in the document. However, for collections within limited domains it is possible to provide more powerful access tools. This paper describes a system designed for collections of reports of infectious disease outbreaks. The system, Proteus-BIO, automatically creates a table of outbreaks, with each table entry linked to the document describing that outbreak; this makes it possible to use database operations such as selection and sorting to find relevant documents. Proteus-BIO consists of a Web crawler which gathers relevant documents; an information extraction engine which converts the individual outbreak events to a tabular database; and a database browser which provides access to the events and, through them, to the documents. The information extraction engine uses sets of patterns and word classes to extract the information about each event. Preparing these patterns and word classes has been a time-consuming manual operation in the past, but automated discovery tools now make this task significantly easier. A small study comparing the effectiveness of the tabular index with conventional Web search tools demonstrated that users can find substantially more documents in a given time period with Proteus-BIO.

摘要

文档搜索通常基于文档中的单个术语。然而,对于有限领域内的文档集合,有可能提供更强大的访问工具。本文描述了一个为传染病暴发报告集合设计的系统。该系统Proteus-BIO会自动创建一个暴发事件表,每个表项都链接到描述该暴发事件的文档;这使得使用诸如选择和排序等数据库操作来查找相关文档成为可能。Proteus-BIO由一个收集相关文档的网络爬虫、一个将单个暴发事件转换为表格数据库的信息提取引擎以及一个提供对事件的访问并通过这些事件访问文档的数据库浏览器组成。信息提取引擎使用模式集和词类来提取关于每个事件的信息。过去,准备这些模式和词类是一项耗时的手动操作,但现在自动化发现工具使这项任务变得容易得多。一项比较表格索引与传统网络搜索工具有效性的小型研究表明,用户使用Proteus-BIO在给定时间段内可以找到更多的文档。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验