Suppr超能文献

提取整个医学文献数据库(MEDLINE)中的名词短语。

Extracting noun phrases for all of MEDLINE.

作者信息

Bennett N A, He Q, Powell K, Schatz B R

机构信息

CANIS-Community Architectures for Network Information Systems, Graduate School of Library and Information Science, University of Illinois at Urbana-Champaign 61820, USA.

出版信息

Proc AMIA Symp. 1999:671-5.

Abstract

A natural language parser that could extract noun phrases for all medical texts would be of great utility in analyzing content for information retrieval. We discuss the extraction of noun phrases from MEDLINE, using a general parser not tuned specifically for any medical domain. The noun phrase extractor is made up of three modules: tokenization; part-of-speech tagging; noun phrase identification. Using our program, we extracted noun phrases from the entire MEDLINE collection, encompassing 9.3 million abstracts. Over 270 million noun phrases were generated, of which 45 million were unique. The quality of these phrases was evaluated by examining all phrases from a sample collection of abstracts. The precision and recall of the phrases from our general parser compared favorably with those from three other parsers we had previously evaluated. We are continuing to improve our parser and evaluate our claim that a generic parser can effectively extract all the different phrases across the entire medical literature.

摘要

一个能够为所有医学文本提取名词短语的自然语言解析器,在分析信息检索内容方面将具有很大的实用价值。我们讨论了使用一个未针对任何医学领域进行专门调整的通用解析器从MEDLINE中提取名词短语的方法。名词短语提取器由三个模块组成:分词;词性标注;名词短语识别。使用我们的程序,我们从整个MEDLINE数据库中提取了名词短语,该数据库包含930万篇摘要。生成了超过2.7亿个名词短语,其中4500万个是唯一的。通过检查摘要样本集中的所有短语来评估这些短语的质量。我们通用解析器提取的短语的精确率和召回率与我们之前评估的其他三个解析器相比具有优势。我们正在继续改进我们的解析器,并评估我们的说法,即一个通用解析器可以有效地从整个医学文献中提取所有不同的短语。

相似文献

4
Identifying important concepts from medical documents.
J Biomed Inform. 2006 Dec;39(6):668-79. doi: 10.1016/j.jbi.2006.02.001. Epub 2006 Mar 2.
5
Leveraging syntax to better capture the semantics of elliptical coordinated compound noun phrases.
J Biomed Inform. 2017 Aug;72:120-131. doi: 10.1016/j.jbi.2017.07.001. Epub 2017 Jul 4.
7
Information content in Medline record fields.
Int J Med Inform. 2004 Jun 30;73(6):515-27. doi: 10.1016/j.ijmedinf.2004.02.008.
8
Corpus-based statistical screening for phrase identification.
J Am Med Inform Assoc. 2000 Sep-Oct;7(5):499-511. doi: 10.1136/jamia.2000.0070499.
9
Informative Causality Extraction from Medical Literature via Dependency-Tree-Based Patterns.
J Healthc Inform Res. 2022 May 25;6(3):295-316. doi: 10.1007/s41666-022-00116-z. eCollection 2022 Sep.
10
Comparing and combining chunkers of biomedical text.
J Biomed Inform. 2011 Apr;44(2):354-60. doi: 10.1016/j.jbi.2010.10.005. Epub 2010 Nov 4.

引用本文的文献

1
PubMed Phrases, an open set of coherent phrases for searching biomedical literature.
Sci Data. 2018 Jun 12;5:180104. doi: 10.1038/sdata.2018.104.
2
ChemTok: A New Rule Based Tokenizer for Chemical Named Entity Recognition.
Biomed Res Int. 2016;2016:4248026. doi: 10.1155/2016/4248026. Epub 2016 Jan 28.
3
Identifying well-formed biomedical phrases in MEDLINE® text.
J Biomed Inform. 2012 Dec;45(6):1035-41. doi: 10.1016/j.jbi.2012.05.005. Epub 2012 Jun 8.
6
Corpus-based statistical screening for phrase identification.
J Am Med Inform Assoc. 2000 Sep-Oct;7(5):499-511. doi: 10.1136/jamia.2000.0070499.

本文引用的文献

2
Taming MEDLINE with concept spaces.
Science. 1998 Sep 18;281(5384):1785. doi: 10.1126/science.281.5384.1785.
3
Information retrieval in digital libraries: bringing search to the net.
Science. 1997 Jan 17;275(5298):327-34. doi: 10.1126/science.275.5298.327.
5
Computer auditing of surgical operative reports written in English.
Proc Annu Symp Comput Appl Med Care. 1993:269-73.
6
A general natural-language text processor for clinical radiology.
J Am Med Inform Assoc. 1994 Mar-Apr;1(2):161-74. doi: 10.1136/jamia.1994.95236146.
7
Extending a natural language parser with UMLS knowledge.
Proc Annu Symp Comput Appl Med Care. 1991:194-8.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验