Suppr超能文献

对抗式主动学习在医学概念识别和标注不一致性中的应用。

Adversarial active learning for the identification of medical concepts and annotation inconsistency.

机构信息

Department of IT Center, the Children's Hospital, Zhejiang University School of Medicine, China; National Clinical Research Center for Child Health, China.

Department of Artificial Intelligence, Enterprise Institute, Ewell Technology, China.

出版信息

J Biomed Inform. 2020 Aug;108:103481. doi: 10.1016/j.jbi.2020.103481. Epub 2020 Jul 18.

Abstract

OBJECTIVE

Named entity recognition (NER) is a principal task in the biomedical field and deep learning-based algorithms have been widely applied to biomedical NER. However, all of these methods that are applied to biomedical corpora use only annotated samples to maximize their performances. Thus, (1) large numbers of unannotated samples are relinquished and their values are overlooked. (2) Compared with other types of active learning (AL) algorithms, generative adversarial learning (GAN)-based AL methods have developed slowly. Furthermore, current diversity-based AL methods only compute similarities between a pair of sentences and cannot evaluate distribution similarities between groups of sentences. Annotation inconsistency is one of the significant challenges in the biomedical annotation field. Most existing methods for addressing this challenge are statistics-based or rule-based methods. (3) They require sufficient expert knowledge and complex designs. To address challenges (1), (2), and (3) simultaneously, we propose innovative algorithms.

METHODS

GAN is introduced in this paper, and we propose the GAN-bidirectional long short-term memory-conditional random field (GAN-BiLSTM-CRF) and the GAN-bidirectional encoder representations from transformers-conditional random field (GAN-BERT-CRF) models, which can be considered an NER model, an AL model, and a model identifying error labels. BiLSTM-CRF or BERT-CRF is defined as the generator and a convolutional neural network (CNN)-based network is considered the discriminator. (1) The generator employs unannotated samples in addition to annotated samples to maximize NER performance. (2) The outputs of the CRF layer and the discriminator are used to select unlabeled samples for the AL task. (3) The discriminator discriminates the distribution of error labels from that of correct labels, identify error labels, and address the annotation inconsistency challenge.

RESULTS

The corpus from the 2010 i2b2/VA NLP challenge and the Chinese CCKS-2017 Task 2 dataset are adopted for experiments. Compared to the baseline BiLSTM-CRF and BERT-CRF, the GAN-BiLSTM-CRF and GAN-BERT-CRF models achieved significant improvements on the precision, recall, and F1 scores in terms of NER performance. Learning curves in AL experiments show the comparative results of the proposed models. Furthermore, the trained discriminator can identify samples with incorrect medical labels in both simulation and real-word experimental environments.

CONCLUSION

The idea of introducing GAN contributes significant results in terms of NER, active learning, and the ability to identify incorrect annotated samples. The benefits of GAN will be further studied.

摘要

目的

命名实体识别(NER)是生物医学领域的主要任务,基于深度学习的算法已广泛应用于生物医学 NER。然而,所有应用于生物医学语料库的方法都只使用标注样本来最大限度地提高性能。因此,(1)大量未标注的样本被放弃,其价值被忽视。(2)与其他类型的主动学习(AL)算法相比,基于生成对抗网络(GAN)的 AL 方法发展缓慢。此外,目前基于多样性的 AL 方法仅计算一对句子之间的相似度,而无法评估句子组之间的分布相似度。标注不一致是生物医学标注领域的一个重大挑战。大多数现有的解决这个问题的方法都是基于统计或规则的方法。(3)它们需要足够的专家知识和复杂的设计。为了解决挑战(1)、(2)和(3),我们提出了创新的算法。

方法

本文引入了 GAN,并提出了 GAN-BiLSTM-CRF 和 GAN-BERT-CRF 模型,它们可以被视为 NER 模型、AL 模型和识别错误标签的模型。BiLSTM-CRF 或 BERT-CRF 被定义为生成器,基于卷积神经网络(CNN)的网络被认为是鉴别器。(1)生成器除了使用标注样本外,还使用未标注样本来最大限度地提高 NER 性能。(2)CRF 层和鉴别器的输出用于选择用于 AL 任务的未标注样本。(3)鉴别器区分错误标签和正确标签的分布,识别错误标签,并解决标注不一致的挑战。

结果

采用 2010 年 i2b2/VA NLP 挑战赛和中国 CCKS-2017 任务 2 数据集进行实验。与基线 BiLSTM-CRF 和 BERT-CRF 相比,GAN-BiLSTM-CRF 和 GAN-BERT-CRF 模型在 NER 性能的精度、召回率和 F1 得分方面都取得了显著的提高。AL 实验中的学习曲线显示了所提出模型的比较结果。此外,在模拟和实际实验环境中,训练有素的鉴别器可以识别出具有不正确医学标签的样本。

结论

引入 GAN 的思想在 NER、主动学习和识别不正确标注样本的能力方面取得了显著的成果。GAN 的优势将进一步研究。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验