Centre for Health Informatics, City University, London, UK.
J Biomed Inform. 2012 Oct;45(5):901-12. doi: 10.1016/j.jbi.2012.02.012. Epub 2012 Mar 17.
Generation of entity coreference chains provides a means to extract linked narrative events from clinical notes, but despite being a well-researched topic in natural language processing, general-purpose coreference tools perform poorly on clinical texts. This paper presents a knowledge-centric and pattern-based approach to resolving coreference across a wide variety of clinical records from two corpora (Ontology Development and Information Extraction (ODIE) and i2b2/VA), and describes a method for generating coreference chains using progressively pruned linked lists that reduces the search space and facilitates evaluation by a number of metrics. Independent evaluation results give an F-measure for each corpus of 79.2% and 87.5%, respectively. A baseline of blind coreference of mentions of the same class gives F-measures of 65.3% and 51.9% respectively. For the ODIE corpus, recall is significantly improved over the baseline (p<0.05) but overall there was no statistically significant improvement in F-measure (p>0.05). For the i2b2/VA corpus, recall, precision, and F-measure are significantly improved over the baseline (p<0.05). Overall, our approach offers performance at least as good as human annotators and greatly increased performance over general-purpose tools. The system uses a number of open-source components that are available to download.
生成实体共指链提供了一种从临床记录中提取相关叙述事件的方法,但尽管在自然语言处理中是一个研究得很好的课题,通用的共指工具在临床文本上的表现却很差。本文提出了一种基于知识和模式的方法,用于解决来自两个语料库(Ontology Development and Information Extraction (ODIE) 和 i2b2/VA)的各种临床记录中的共指问题,并描述了一种使用逐步修剪的链表生成共指链的方法,该方法减少了搜索空间,并通过多种指标促进了评估。独立的评估结果分别为每个语料库给出了 79.2%和 87.5%的 F 测度。对同一类提及进行盲目共指的基线分别给出了 65.3%和 51.9%的 F 测度。对于 ODIE 语料库,召回率相对于基线有显著提高(p<0.05),但总体 F 测度没有显著提高(p>0.05)。对于 i2b2/VA 语料库,召回率、精度和 F 测度都显著优于基线(p<0.05)。总的来说,我们的方法提供了至少与人工注释者一样好的性能,并大大提高了通用工具的性能。该系统使用了许多可下载的开源组件。