Text Analytics and NLP Lab, Department of Computer Applications, NIT Trichy, India.
Text Analytics and NLP Lab, Department of Computer Applications, NIT Trichy, India.
Artif Intell Med. 2021 Feb;112:102008. doi: 10.1016/j.artmed.2021.102008. Epub 2021 Jan 7.
In the last few years, people started to share lots of information related to health in the form of tweets, reviews and blog posts. All these user generated clinical texts can be mined to generate useful insights. However, automatic analysis of clinical text requires identification of standard medical concepts. Most of the existing deep learning based medical concept normalization systems are based on CNN or RNN. Performance of these models is limited as they have to be trained from scratch (except embeddings). In this work, we propose a medical concept normalization system based on BERT and highway layer. BERT, a pre-trained context sensitive deep language representation model advanced state-of-the-art performance in many NLP tasks and gating mechanism in highway layer helps the model to choose only important information. Experimental results show that our model outperformed all existing methods on two standard datasets. Further, we conduct a series of experiments to study the impact of different learning rates and batch sizes, noise and freezing encoder layers on our model.
在过去的几年中,人们开始以微博、评论和博客文章的形式分享大量与健康相关的信息。所有这些用户生成的临床文本都可以被挖掘出来以生成有用的见解。然而,临床文本的自动分析需要识别标准的医疗概念。现有的大多数基于深度学习的医学概念规范化系统都是基于 CNN 或 RNN。由于这些模型必须从头开始训练(除了嵌入),因此它们的性能受到限制。在这项工作中,我们提出了一种基于 BERT 和高速公路层的医学概念规范化系统。BERT 是一种预先训练的上下文敏感的深度语言表示模型,在许多 NLP 任务中都取得了先进的性能,而高速公路层中的门控机制则帮助模型只选择重要的信息。实验结果表明,我们的模型在两个标准数据集上的表现优于所有现有的方法。此外,我们还进行了一系列实验,研究了不同的学习率和批量大小、噪声和冻结编码器层对我们模型的影响。