Hahn U, Romacker M, Schulz S
Freiburg University, Computational Linguistics Lab, Germany.
Int J Med Inform. 1999 Jan;53(1):1-28. doi: 10.1016/s1386-5056(98)00091-4.
The automatic analysis of medical narratives currently suffers from neglecting text structure phenomena such as referential relations between discourse units. This has unwarranted effects on the descriptional adequacy of medical knowledge bases automatically generated from texts. The resulting representation bias can be characterized in terms of incomplete, artificially fragmented and referentially invalid knowledge structures. We focus here on four basic types of textual reference relations, viz. pronominal and nominal anaphora, textual ellipsis and metonymy and show how to deal with them in an adequate text parsing device. Since the types of reference relations we discuss show an increasing dependence on conceptual background knowledge, we stress the need for formally grounded, expressive conceptual representation systems for medical knowledge. Our suggestions are based on experience with MEDSYNDIKATE, a medical text knowledge acquisition system designed to properly deal with various sorts of discourse structure phenomena.
目前,医学叙述的自动分析存在忽视文本结构现象的问题,比如话语单元之间的指代关系。这对从文本中自动生成的医学知识库的描述充分性产生了不合理的影响。由此产生的表征偏差可以用不完整、人为碎片化和指代无效的知识结构来描述。我们在此关注文本指代关系的四种基本类型,即代词指代和名词指代、文本省略以及转喻,并展示如何在适当的文本解析装置中处理它们。由于我们讨论的指代关系类型越来越依赖概念背景知识,我们强调需要有形式基础、富有表现力的医学知识概念表示系统。我们的建议基于MEDSYNDIKATE的经验,MEDSYNDIKATE是一个旨在妥善处理各种话语结构现象的医学文本知识获取系统。