Zieman Y L, Bleich H L
Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, USA.
Proc AMIA Annu Fall Symp. 1997:519-22.
This paper describes a way to map users' queries to relevant Medical Subject Headings (MeSH terms) used by the National Library of Medicine to index the biomedical literature. The method, called SENSE (SEarch with New SEmantics), transforms words and phrases in the users' queries into primary conceptual components and compares these components with those of the MeSH vocabulary. Similar to the way in which most numbers can be split into numerical factors and expressed as their product--for example, 42 can be expressed as 221, 67, 314, 237,--so most medical concepts can be split into "semantic factors" and expressed as their juxtaposition. Note that if we split 42 into its primary factors, the breakdown is unique: 23*7. Similarly, when we split medical concepts into their "primary semantic factors" the breakdown is also unique. For example, the MeSH term 'renovascular hypertension' can be split morphologically into reno, vascular, hyper, and tension--morphemes that can then be translated into their primary semantic factors--kidney, blood vessel, high, and pressure. By "factoring" each MeSH term in this way, and by similarly factoring the user's query, we can match query to MeSH term by searching for combinations of common factors. Unlike UMLS and other methods that match at the level of words or phrases, SENSE matches at the level of concepts; in this way, a wide variety of words and phrases that have the same meaning produce the same match. Now used in PaperChase, the method is surprisingly powerful in matching users' queries to Medical Subject Headings.
本文介绍了一种将用户查询映射到美国国立医学图书馆用于索引生物医学文献的相关医学主题词(MeSH词)的方法。该方法称为SENSE(具有新语义的搜索),它将用户查询中的单词和短语转换为主要概念组件,并将这些组件与MeSH词汇表中的组件进行比较。类似于大多数数字可以分解为数字因数并表示为它们的乘积——例如,42可以表示为2×21、6×7、3×14、2×3×7——大多数医学概念也可以分解为“语义因数”并表示为它们的并列。请注意,如果我们将42分解为其主要因数,分解是唯一的:2×3×7。同样,当我们将医学概念分解为它们的“主要语义因数”时,分解也是唯一的。例如,MeSH词“肾血管性高血压”可以在形态上分解为reno、vascular、hyper和tension——这些词素然后可以转换为它们的主要语义因数——肾脏、血管、高和压力。通过以这种方式“分解”每个MeSH词,并类似地分解用户查询,我们可以通过搜索共同因数的组合来将查询与MeSH词进行匹配。与在单词或短语级别进行匹配的UMLS和其他方法不同,SENSE在概念级别进行匹配;通过这种方式,各种具有相同含义的单词和短语会产生相同的匹配。该方法现在用于PaperChase,在将用户查询与医学主题词进行匹配方面具有惊人的强大功能。