Srinivasan Suresh, Rindflesch Thomas C, Hole William T, Aronson Alan R, Mork James G
National Library of Medicine, Bethesda, MD, USA.
Proc AMIA Symp. 2002:727-31.
The entire collection of 11.5 million MEDLINE abstracts was processed to extract 549 million noun phrases using a shallow syntactic parser. English language strings in the 2002 and 2001 releases of the UMLS Metathesaurus were then matched against these phrases using flexible matching techniques. 34% of the Metathesaurus names (occurring in 30% of the concepts) were found in the titles and abstracts of articles in the literature. The matching concepts are fairly evenly chemical and non-chemical in nature and span a wide spectrum of semantic types. This paper details the approach taken and the results of the analysis.
使用浅句法分析器对1150万篇MEDLINE摘要的全集进行处理,以提取5.49亿个名词短语。然后,使用灵活匹配技术将2002年和2001年版UMLS元词表中的英语字符串与这些短语进行匹配。在文献中文章的标题和摘要中发现了元词表中34%的名称(出现在30%的概念中)。匹配的概念在性质上化学和非化学的分布相当均匀,并且涵盖广泛的语义类型。本文详细介绍了所采用的方法和分析结果。