Department of Computer Science, UIUC, Urbana, IL 61801, USA.
J Am Med Inform Assoc. 2013 Mar-Apr;20(2):356-62. doi: 10.1136/amiajnl-2011-000767. Epub 2012 Jul 10.
This paper presents a coreference resolution system for clinical narratives. Coreference resolution aims at clustering all mentions in a single document to coherent entities.
A knowledge-intensive approach for coreference resolution is employed. The domain knowledge used includes several domain-specific lists, a knowledge intensive mention parsing, and task informed discourse model. Mention parsing allows us to abstract over the surface form of the mention and represent each mention using a higher-level representation, which we call the mention's semantic representation (SR). SR reduces the mention to a standard form and hence provides better support for comparing and matching. Existing coreference resolution systems tend to ignore discourse aspects and rely heavily on lexical and structural cues in the text. The authors break from this tradition and present a discourse model for "person" type mentions in clinical narratives, which greatly simplifies the coreference resolution.
This system was evaluated on four different datasets which were made available in the 2011 i2b2/VA coreference challenge. The unweighted average of F1 scores (over B-cubed, MUC and CEAF) varied from 84.2% to 88.1%. These experiments show that domain knowledge is effective for different mention types for all the datasets.
Error analysis shows that most of the recall errors made by the system can be handled by further addition of domain knowledge. The precision errors, on the other hand, are more subtle and indicate the need to understand the relations in which mentions participate for building a robust coreference system.
This paper presents an approach that makes an extensive use of domain knowledge to significantly improve coreference resolution. The authors state that their system and the knowledge sources developed will be made publicly available.
本文提出了一种针对临床叙述的共指消解系统。共指消解旨在将单一文档中的所有提及聚类为连贯的实体。
采用了一种知识密集型的共指消解方法。所使用的领域知识包括几个特定领域的列表、知识密集型提及解析和任务通知的话语模型。提及解析使我们能够抽象出提及的表面形式,并使用更高层次的表示来表示每个提及,我们称之为提及的语义表示(SR)。SR 将提及简化为标准形式,从而为比较和匹配提供更好的支持。现有的共指消解系统往往忽略话语方面,严重依赖文本中的词汇和结构线索。作者打破了这一传统,提出了一种针对临床叙述中“人”类型提及的话语模型,这大大简化了共指消解。
该系统在 2011 年 i2b2/VA 共指挑战中提供的四个不同数据集上进行了评估。F1 分数(在 B-cubed、MUC 和 CEAF 上)的未加权平均值从 84.2%到 88.1%不等。这些实验表明,领域知识对于所有数据集的不同提及类型都是有效的。
错误分析表明,系统造成的大多数召回错误可以通过进一步添加领域知识来解决。另一方面,精度错误则更为微妙,表明需要理解提及参与的关系,以构建一个稳健的共指系统。
本文提出了一种方法,该方法广泛使用领域知识,显著提高了共指消解的性能。作者表示,他们的系统和开发的知识来源将公开发布。