Baud R H, Lovis C, Ruch P, Rassinoux A M
Medical Informatics Division, University Hospital of Geneva, Switzerland.
Stud Health Technol Inform. 2000;77:456-61.
The processing of medical texts is a burden in the absence of a toolset designed for simple operations such as recognizing morphological variants, updating and accessing a word dictionary of the domain and segmenting words with multiple morpho-semantems. The apparent simplicity of these basic operations is an illusion because it soon becomes clear that quality implementation is a long-term task. Coherency between subtasks may be lacking unless strict rules are enforced. In fact, good tools are rarely available or have not been tailored for the medical profession. This paper aims at defining a complete toolset for medical word processing. In addition, it provides relevant examples of the inherent difficulties of this task. It reports on typical results that can be expected from an industry-standard implementation.
在缺乏专为诸如识别形态变体、更新和访问领域词汇表以及对具有多种形态语义的单词进行分词等简单操作而设计的工具集的情况下,医学文本的处理是一项负担。这些基本操作表面上看似简单,实则是一种错觉,因为很快就会发现高质量的实现是一项长期任务。除非执行严格的规则,否则子任务之间可能缺乏连贯性。事实上,很少有好用的工具,或者说没有专门为医学专业量身定制的工具。本文旨在定义一套完整的医学文字处理工具集。此外,它还提供了这项任务固有困难的相关示例。它报告了行业标准实现可能产生的典型结果。