Verspoor Karin
Los Alamos National Laboratory, Los Alamos, NM 87545, USA.
Comp Funct Genomics. 2005;6(1-2):61-6. doi: 10.1002/cfg.451.
This paper explores the use of the resources in the National Library of Medicine's Unified Medical Language System (UMLS) for the construction of a lexicon useful for processing texts in the field of molecular biology. A lexicon is constructed from overlapping terms in the UMLS SPECIALIST lexicon and the UMLS Metathesaurus to obtain both morphosyntactic and semantic information for terms, and the coverage of a domain corpus is assessed. Over 77% of tokens in the domain corpus are found in the constructed lexicon, validating the lexicon's coverage of the most frequent terms in the domain and indicating that the constructed lexicon is potentially an important resource for biological text processing.
本文探讨了利用美国国立医学图书馆统一医学语言系统(UMLS)中的资源来构建一个有助于处理分子生物学领域文本的词汇表。该词汇表由UMLS专业词典和UMLS元词表中的重叠术语构建而成,以获取术语的形态句法和语义信息,并评估领域语料库的覆盖范围。结果发现,领域语料库中超过77%的词元出现在构建的词汇表中,这验证了该词汇表对领域中最常见术语的覆盖范围,并表明构建的词汇表可能是生物文本处理的重要资源。