Rindflesch T C, Tanabe L, Weinstein J N, Hunter L
Lister Hill Center, National Library of Medicine, Bethesda, MD 20894, USA.
Pac Symp Biocomput. 2000:517-28. doi: 10.1142/9789814447331_0049.
EDGAR (Extraction of Drugs, Genes and Relations) is a natural language processing system that extracts information about drugs and genes relevant to cancer from the biomedical literature. This automatically extracted information has remarkable potential to facilitate computational analysis in the molecular biology of cancer, and the technology is straightforwardly generalizable to many areas of biomedicine. This paper reports on the mechanisms for automatically generating such assertions and on a simple application, conceptual clustering of documents. The system uses a stochastic part of speech tagger, generates an underspecified syntactic parse and then uses semantic and pragmatic information to construct its assertions. The system builds on two important existing resources: the MEDLINE database of biomedical citations and abstracts and the Unified Medical Language System, which provides syntactic and semantic information about the terms found in biomedical abstracts.
EDGAR(药物、基因及关系提取系统)是一个自然语言处理系统,它从生物医学文献中提取与癌症相关的药物和基因信息。这种自动提取的信息在促进癌症分子生物学的计算分析方面具有显著潜力,并且该技术可直接推广到生物医学的许多领域。本文报告了自动生成此类断言的机制以及一个简单应用,即文档的概念聚类。该系统使用随机词性标注器,生成未完全指定的句法剖析,然后利用语义和语用信息来构建其断言。该系统基于两个重要的现有资源:生物医学文献引用和摘要的MEDLINE数据库以及统一医学语言系统,后者提供有关生物医学摘要中术语的句法和语义信息。