Liang Zihong, Chen Junjie, Xu Zhaopeng, Chen Yuyang, Hao Tianyong
School of Computer Science, South China Normal University, Guangzhou, China.
Software College, Northeastern University, Shenyang, China.
Front Artif Intell. 2019 May 14;2:1. doi: 10.3389/frai.2019.00001. eCollection 2019.
The identification of medical entities and relations from electronic medical records is a fundamental research issue for medical informatics. However, the task of extracting valuable knowledge from these records is challenging due to its high complexity. The accurate identification of entity and relation is still an open research problem in medical information extraction. A pattern-based method for extracting certain tumor-related entities and attributes from Chinese unstructured diagnostic imaging text is proposed. This method is a composition of three steps. Firstly, an algorithm based on keyword matching is designed to obtain the primary sites of tumors. Then a set of regular expressions is applied to identify primary tumor size information. Finally, a set of rules is defined to acquire metastatic sites of tumors. Our method achieves a recall of 0.697, a precision of 0.825 and an F1 score of 0.755 using an overall weighted metric. For each of the extraction tasks, the F1 scores are 0.784, 0.822 and 0.740. The method proves to be stable and robust with different amounts of testing data. It achieves a comparatively high performance in the CHIP 2018 open challenge, demonstrating its effectiveness in extracting tumor-related entities from Chinese diagnostic imaging text.
从电子病历中识别医学实体和关系是医学信息学的一个基础研究问题。然而,由于其高度复杂性,从这些记录中提取有价值知识的任务具有挑战性。实体和关系的准确识别在医学信息提取中仍然是一个开放的研究问题。本文提出了一种基于模式的方法,用于从中文非结构化诊断影像文本中提取特定的肿瘤相关实体和属性。该方法由三个步骤组成。首先,设计一种基于关键词匹配的算法来获取肿瘤的原发部位。然后应用一组正则表达式来识别原发肿瘤大小信息。最后,定义一组规则来获取肿瘤的转移部位。使用总体加权度量,我们的方法召回率达到0.697,精确率达到0.825,F1分数达到0.755。对于每个提取任务,F1分数分别为0.784、0.822和0.740。该方法在不同数量的测试数据下被证明是稳定且鲁棒的。它在CHIP 2018开放挑战赛中取得了较高的性能,证明了其在从中文诊断影像文本中提取肿瘤相关实体方面的有效性。