Franz P, Zaiss A, Schulz S, Hahn U, Klar R
Freiburg University Hospital, Department of Medical Informatics.
Proc AMIA Symp. 2000:250-4.
In Germany, new legal requirements have raised the importance of the accurate encoding of admission and discharge diseases for in- and outpatients. In response to emerging needs for computer-supported tools we examined three methods for automated coding of German-language free-text diagnosis phrases. We compared a language-independent lexicon-free n-gram approach with one which uses a dictionary of medical morphemes and refines the query by a mapping to SNOMED codes. Both techniques produced a ranked output of possible diagnoses within a vector space framework for retrieval. The results did not reveal any significant difference: The correct diagnosis was found in approximately 40% for three-digit codes, and 30% for four-digit codes. The lexicon-based method was then modified by substituting the vector space ranking by a heuristic approach that capitalizes on the semantic structure of SNOMED, thus raising the number of correct diagnoses significantly (approximately 50% for three-digit codes, and 40% for four-digit codes). As a result, we claim that lexicon-based retrieval methods do not perform better than the lexicon-free ones, unless conceptual knowledge is added.
在德国,新的法律要求提高了准确编码门诊和住院患者出入院疾病的重要性。针对计算机支持工具的新需求,我们研究了三种用于自动编码德语自由文本诊断短语的方法。我们将一种与语言无关的无词典n元语法方法与一种使用医学词素词典并通过映射到SNOMED代码来优化查询的方法进行了比较。两种技术都在向量空间框架内生成了可能诊断的排名输出以供检索。结果没有显示出任何显著差异:三位代码的正确诊断率约为40%,四位代码的正确诊断率约为30%。然后,基于词典的方法通过用一种利用SNOMED语义结构的启发式方法替代向量空间排名进行了修改,从而显著提高了正确诊断的数量(三位代码约为50%,四位代码约为40%)。因此,我们认为,除非添加概念性知识,基于词典的检索方法并不比无词典方法表现更好。