Liu Sijia, Liu Hongfang, Chaudhary Vipin, Li Dingcheng
University at Buffalo, the State University of New York, Buffalo, NY;
Mayo Clinic, Rochester, MN.
AMIA Jt Summits Transl Sci Proc. 2016 Jul 22;2016:428-37. eCollection 2016.
It is widely acknowledged that natural language processing is indispensable to process electronic health records (EHRs). However, poor performance in relation detection tasks, such as coreference (linguistic expressions pertaining to the same entity/event) may affect the quality of EHR processing. Hence, there is a critical need to advance the research for relation detection from EHRs. Most of the clinical coreference resolution systems are based on either supervised machine learning or rule-based methods. The need for manually annotated corpus hampers the use of such system in large scale. In this paper, we present an infinite mixture model method using definite sampling to resolve coreferent relations among mentions in clinical notes. A similarity measure function is proposed to determine the coreferent relations. Our system achieved a 0.847 F-measure for i2b2 2011 coreference corpus. This promising results and the unsupervised nature make it possible to apply the system in big-data clinical setting.
人们普遍认为,自然语言处理对于处理电子健康记录(EHR)不可或缺。然而,在关系检测任务(如共指消解,即与同一实体/事件相关的语言表达)方面表现不佳,可能会影响电子健康记录处理的质量。因此,迫切需要推进从电子健康记录中进行关系检测的研究。大多数临床共指消解系统基于监督机器学习或基于规则的方法。对人工标注语料库的需求阻碍了此类系统在大规模场景中的应用。在本文中,我们提出了一种使用确定性采样的无限混合模型方法,以解决临床笔记中提及内容之间的共指关系。提出了一种相似性度量函数来确定共指关系。我们的系统在i2b2 2011共指语料库上的F值为0.847。这一有前景的结果以及无监督的特性使得该系统能够应用于大数据临床场景。