Markó Kornél, Daumke Philipp, Schulz Stefan, Hahn Udo
Freiburg University Hospital of Medical Informatics (http://www.imbi.uni-freiburg.de/medinf)
AMIA Annu Symp Proc. 2003;2003:425-9.
We consider three alternative procedures for the automatic indexing of medical documents using MeSH thesaurus identifiers as target units (document descriptors). Rather than considering complete words as the starting point of the indexing procedure, we here propose morphologically plausible subwords as basic units from which MeSH terms are derived. We describe the morphological segmentation and normalization procedures, as well as the mappings from subwords to MeSH terms, and discuss results from an evaluation carried out on a German-language corpus.
我们考虑了三种使用医学主题词表(MeSH)标识符作为目标单元(文档描述符)对医学文档进行自动索引的替代方法。我们不是将完整的单词作为索引过程的起点,而是提出形态上合理的子词作为派生MeSH术语的基本单元。我们描述了形态分割和规范化过程,以及从子词到MeSH术语的映射,并讨论了对德语语料库进行评估的结果。