Center of Excellence in Research and Education for Big Military Data Intelligence (CREDIT), Department of Electrical and Computer Engineering, Prairie View A&M University, Texas A&M University System, Prairie View, Texas 77446, United States of America.
Schools of Computer Science and Technology, Harbin Institute of Technology, Harbin, China.
PLoS One. 2019 May 2;14(5):e0216046. doi: 10.1371/journal.pone.0216046. eCollection 2019.
Specific entity terms such as disease, test, symptom, and genes in Electronic Medical Record (EMR) can be extracted by Named Entity Recognition (NER). However, limited resources of labeled EMR pose a great challenge for mining medical entity terms. In this study, a novel multitask bi-directional RNN model combined with deep transfer learning is proposed as a potential solution of transferring knowledge and data augmentation to enhance NER performance with limited data. The proposed model has been evaluated using micro average F-score, macro average F-score and accuracy. It is observed that the proposed model outperforms the baseline model in the case of discharge datasets. For instance, for the case of discharge summary, the micro average F-score is improved by 2.55% and the overall accuracy is improved by 7.53%. For the case of progress notes, the micro average F-score and the overall accuracy are improved by 1.63% and 5.63%, respectively.
特定实体术语,如疾病、测试、症状和基因,可以通过命名实体识别(NER)从电子病历(EMR)中提取。然而,有限的标记 EMR 资源对挖掘医学实体术语提出了巨大挑战。在这项研究中,提出了一种新的多任务双向 RNN 模型,结合深度迁移学习,作为一种潜在的解决方案,通过有限的数据转移知识和数据扩充来提高 NER 性能。使用微平均 F 分数、宏平均 F 分数和准确性对所提出的模型进行了评估。结果表明,在出院数据集的情况下,所提出的模型优于基线模型。例如,对于出院小结的情况,微平均 F 分数提高了 2.55%,整体准确率提高了 7.53%。对于进度记录的情况,微平均 F 分数和整体准确率分别提高了 1.63%和 5.63%。