Department of Computer Science, Columbia University, New York City, NY, USA.
Department of Biomedical Data Science, Stanford University School of Medicine, Stanford, CA, USA.
J Biomed Inform. 2018 Feb;78:78-86. doi: 10.1016/j.jbi.2017.12.016. Epub 2018 Jan 9.
To date, the methods developed for automated extraction of information from radiology reports are mainly rule-based or dictionary-based, and, therefore, require substantial manual effort to build these systems. Recent efforts to develop automated systems for entity detection have been undertaken, but little work has been done to automatically extract relations and their associated named entities in narrative radiology reports that have comparable accuracy to rule-based methods. Our goal is to extract relations in a unsupervised way from radiology reports without specifying prior domain knowledge. We propose a hybrid approach for information extraction that combines dependency-based parse tree with distributed semantics for generating structured information frames about particular findings/abnormalities from the free-text mammography reports. The proposed IE system obtains a F-score of 0.94 in terms of completeness of the content in the information frames, which outperforms a state-of-the-art rule-based system in this domain by a significant margin. The proposed system can be leveraged in a variety of applications, such as decision support and information retrieval, and may also easily scale to other radiology domains, since there is no need to tune the system with hand-crafted information extraction rules.
迄今为止,从放射学报告中自动提取信息的方法主要是基于规则或基于字典的,因此需要大量的人工努力来构建这些系统。最近已经有一些针对实体检测的自动化系统的开发工作,但在自动提取叙事性放射学报告中的关系及其相关命名实体方面,与基于规则的方法相比,所做的工作很少。我们的目标是从放射学报告中以无监督的方式提取关系,而无需指定先前的领域知识。我们提出了一种混合的信息提取方法,将基于依存关系的解析树与分布式语义相结合,从自由文本的乳腺 X 光报告中生成有关特定发现/异常的结构化信息框架。所提出的 IE 系统在信息框架内容的完整性方面的 F 分数达到 0.94,明显优于该领域的最先进的基于规则的系统。所提出的系统可以在各种应用中利用,例如决策支持和信息检索,并且由于不需要使用手工制作的信息提取规则来调整系统,因此也可以很容易地扩展到其他放射学领域。