IEEE J Biomed Health Inform. 2021 Oct;25(10):3804-3811. doi: 10.1109/JBHI.2021.3099755. Epub 2021 Oct 5.
The growing use of electronic health records in the medical domain results in generating a large amount of medical data that is stored in the form of clinical notes. These clinical notes are enriched with clinical entities like disease, treatment, tests, drugs, genes, and proteins. The extraction of clinical entities from clinical notes is a challenging task as clinical notes are written in the form of natural language. The extraction of clinical entities has many useful applications such as clinical notes analysis, medical data privacy, decision support systems, and disease analysis. Although various machine learning and deep learning models are developed to extract clinical entities from clinical notes, developing an accurate model is still challenging. This study presents a novel deep learning-based technique to extract the clinical entities from clinical notes. The proposed model uses local and global context to extract clinical entities in contrast to existing models that use only global context. The combination of CNN, Bi-LSTM, and CRF with non-complex embedding (proposed model) outperforms existing models by a margin of 4-10% and 5-12% in terms of F1-score on i2b2-2010 and i2b2-2012 data. The accurate detection of clinical entities can be helpful in the privacy preservation of medical data that increases the user's and medical organization's trust in sharing medical data.
电子病历在医疗领域的应用日益广泛,导致大量的医疗数据以临床记录的形式存储。这些临床记录中包含了丰富的临床实体,如疾病、治疗、检查、药物、基因和蛋白质。由于临床记录是用自然语言书写的,因此从临床记录中提取临床实体是一项具有挑战性的任务。从临床记录中提取临床实体有许多有用的应用,如临床记录分析、医疗数据隐私、决策支持系统和疾病分析。尽管已经开发了各种机器学习和深度学习模型来从临床记录中提取临床实体,但开发准确的模型仍然具有挑战性。本研究提出了一种从临床记录中提取临床实体的新的深度学习技术。与仅使用全局上下文的现有模型相比,所提出的模型使用局部和全局上下文来提取临床实体。在 i2b2-2010 和 i2b2-2012 数据上,CNN、Bi-LSTM 和 CRF 与非复杂嵌入(提出的模型)的结合在 F1 分数方面优于现有模型,提高了 4-10%和 5-12%。准确检测临床实体有助于保护医疗数据的隐私,从而增加用户和医疗机构对共享医疗数据的信任。