Ji Zongcheng, Wei Qiang, Xu Hua
School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, USA.
AMIA Jt Summits Transl Sci Proc. 2020 May 30;2020:269-277. eCollection 2020.
Developing high-performance entity normalization algorithms that can alleviate the term variation problem is of great interest to the biomedical community. Although deep learning-based methods have been successfully applied to biomedical entity normalization, they often depend on traditional context-independent word embeddings. Bidirectional Encoder Representations from Transformers (BERT), BERT for Biomedical Text Mining (BioBERT) and BERT for Clinical Text Mining (ClinicalBERT) were recently introduced to pre-train contextualized word representation models using bidirectional Transformers, advancing the state-of-the-art for many natural language processing tasks. In this study, we proposed an entity normalization architecture by fine-tuning the pre-trained BERT / BioBERT / ClinicalBERT models and conducted extensive experiments to evaluate the effectiveness of the pre-trained models for biomedical entity normalization using three different types of datasets. Our experimental results show that the best fine-tuned models consistently outperformed previous methods and advanced the state-of-the-art for biomedical entity normalization, with up to 1.17% increase in accuracy.
开发能够缓解术语变化问题的高性能实体归一化算法,引起了生物医学界的极大兴趣。尽管基于深度学习的方法已成功应用于生物医学实体归一化,但它们通常依赖于传统的上下文无关词嵌入。最近引入了来自Transformer的双向编码器表示(BERT)、用于生物医学文本挖掘的BERT(BioBERT)和用于临床文本挖掘的BERT(ClinicalBERT),以使用双向Transformer预训练上下文相关词表示模型,推动了许多自然语言处理任务的技术发展。在本研究中,我们通过微调预训练的BERT/BioBERT/ClinicalBERT模型提出了一种实体归一化架构,并使用三种不同类型的数据集进行了广泛实验,以评估预训练模型在生物医学实体归一化方面的有效性。我们的实验结果表明,最佳微调模型始终优于先前的方法,并推动了生物医学实体归一化的技术发展,准确率提高了1.17%。