对抗式主动学习在医学概念识别和标注不一致性中的应用。

Adversarial active learning for the identification of medical concepts and annotation inconsistency.

机构信息

Department of IT Center, the Children's Hospital, Zhejiang University School of Medicine, China; National Clinical Research Center for Child Health, China.

Department of Artificial Intelligence, Enterprise Institute, Ewell Technology, China.

出版信息

J Biomed Inform. 2020 Aug;108:103481. doi: 10.1016/j.jbi.2020.103481. Epub 2020 Jul 18.

DOI:10.1016/j.jbi.2020.103481

PMID:32687985

Abstract

OBJECTIVE

Named entity recognition (NER) is a principal task in the biomedical field and deep learning-based algorithms have been widely applied to biomedical NER. However, all of these methods that are applied to biomedical corpora use only annotated samples to maximize their performances. Thus, (1) large numbers of unannotated samples are relinquished and their values are overlooked. (2) Compared with other types of active learning (AL) algorithms, generative adversarial learning (GAN)-based AL methods have developed slowly. Furthermore, current diversity-based AL methods only compute similarities between a pair of sentences and cannot evaluate distribution similarities between groups of sentences. Annotation inconsistency is one of the significant challenges in the biomedical annotation field. Most existing methods for addressing this challenge are statistics-based or rule-based methods. (3) They require sufficient expert knowledge and complex designs. To address challenges (1), (2), and (3) simultaneously, we propose innovative algorithms.

METHODS

GAN is introduced in this paper, and we propose the GAN-bidirectional long short-term memory-conditional random field (GAN-BiLSTM-CRF) and the GAN-bidirectional encoder representations from transformers-conditional random field (GAN-BERT-CRF) models, which can be considered an NER model, an AL model, and a model identifying error labels. BiLSTM-CRF or BERT-CRF is defined as the generator and a convolutional neural network (CNN)-based network is considered the discriminator. (1) The generator employs unannotated samples in addition to annotated samples to maximize NER performance. (2) The outputs of the CRF layer and the discriminator are used to select unlabeled samples for the AL task. (3) The discriminator discriminates the distribution of error labels from that of correct labels, identify error labels, and address the annotation inconsistency challenge.

RESULTS

The corpus from the 2010 i2b2/VA NLP challenge and the Chinese CCKS-2017 Task 2 dataset are adopted for experiments. Compared to the baseline BiLSTM-CRF and BERT-CRF, the GAN-BiLSTM-CRF and GAN-BERT-CRF models achieved significant improvements on the precision, recall, and F1 scores in terms of NER performance. Learning curves in AL experiments show the comparative results of the proposed models. Furthermore, the trained discriminator can identify samples with incorrect medical labels in both simulation and real-word experimental environments.

CONCLUSION

The idea of introducing GAN contributes significant results in terms of NER, active learning, and the ability to identify incorrect annotated samples. The benefits of GAN will be further studied.

摘要

目的

命名实体识别（NER）是生物医学领域的主要任务，基于深度学习的算法已广泛应用于生物医学 NER。然而，所有应用于生物医学语料库的方法都只使用标注样本来最大限度地提高性能。因此，（1）大量未标注的样本被放弃，其价值被忽视。（2）与其他类型的主动学习（AL）算法相比，基于生成对抗网络（GAN）的 AL 方法发展缓慢。此外，目前基于多样性的 AL 方法仅计算一对句子之间的相似度，而无法评估句子组之间的分布相似度。标注不一致是生物医学标注领域的一个重大挑战。大多数现有的解决这个问题的方法都是基于统计或规则的方法。（3）它们需要足够的专家知识和复杂的设计。为了解决挑战（1）、（2）和（3），我们提出了创新的算法。

方法

本文引入了 GAN，并提出了 GAN-BiLSTM-CRF 和 GAN-BERT-CRF 模型，它们可以被视为 NER 模型、AL 模型和识别错误标签的模型。BiLSTM-CRF 或 BERT-CRF 被定义为生成器，基于卷积神经网络（CNN）的网络被认为是鉴别器。（1）生成器除了使用标注样本外，还使用未标注样本来最大限度地提高 NER 性能。（2）CRF 层和鉴别器的输出用于选择用于 AL 任务的未标注样本。（3）鉴别器区分错误标签和正确标签的分布，识别错误标签，并解决标注不一致的挑战。

结果

采用 2010 年 i2b2/VA NLP 挑战赛和中国 CCKS-2017 任务 2 数据集进行实验。与基线 BiLSTM-CRF 和 BERT-CRF 相比，GAN-BiLSTM-CRF 和 GAN-BERT-CRF 模型在 NER 性能的精度、召回率和 F1 得分方面都取得了显著的提高。AL 实验中的学习曲线显示了所提出模型的比较结果。此外，在模拟和实际实验环境中，训练有素的鉴别器可以识别出具有不正确医学标签的样本。

结论

引入 GAN 的思想在 NER、主动学习和识别不正确标注样本的能力方面取得了显著的成果。GAN 的优势将进一步研究。

相似文献

Adversarial active learning for the identification of medical concepts and annotation inconsistency.对抗式主动学习在医学概念识别和标注不一致性中的应用。

J Biomed Inform. 2020 Aug;108:103481. doi: 10.1016/j.jbi.2020.103481. Epub 2020 Jul 18.

Extracting comprehensive clinical information for breast cancer using deep learning methods.利用深度学习方法提取乳腺癌全面临床信息。

Int J Med Inform. 2019 Dec;132:103985. doi: 10.1016/j.ijmedinf.2019.103985. Epub 2019 Oct 2.

Biomedical named entity recognition using deep neural networks with contextual information.基于上下文信息的深度神经网络的生物医学命名实体识别。

BMC Bioinformatics. 2019 Dec 27;20(1):735. doi: 10.1186/s12859-019-3321-4.

Extracting clinical named entity for pituitary adenomas from Chinese electronic medical records.从中文电子病历中提取垂体腺瘤的临床命名实体。

BMC Med Inform Decis Mak. 2022 Mar 23;22(1):72. doi: 10.1186/s12911-022-01810-z.

Named entity recognition from Chinese adverse drug event reports with lexical feature based BiLSTM-CRF and tri-training.基于词汇特征的 BiLSTM-CRF 和三训练的中药不良事件报告命名实体识别。

J Biomed Inform. 2019 Aug;96:103252. doi: 10.1016/j.jbi.2019.103252. Epub 2019 Jul 16.

Fast and effective biomedical named entity recognition using temporal convolutional network with conditional random field.使用带有条件随机场的时间卷积网络进行快速有效的生物医学命名实体识别。

Math Biosci Eng. 2020 May 12;17(4):3553-3566. doi: 10.3934/mbe.2020200.

Document-level attention-based BiLSTM-CRF incorporating disease dictionary for disease named entity recognition.基于文档级注意力的 BiLSTM-CRF 结合疾病词典的疾病命名实体识别。

Comput Biol Med. 2019 May;108:122-132. doi: 10.1016/j.compbiomed.2019.04.002. Epub 2019 Apr 7.

Ontology-Based Healthcare Named Entity Recognition from Twitter Messages Using a Recurrent Neural Network Approach.基于本体的推特消息中医疗命名实体识别的递归神经网络方法。

Int J Environ Res Public Health. 2019 Sep 27;16(19):3628. doi: 10.3390/ijerph16193628.

A study of active learning methods for named entity recognition in clinical text.临床文本中命名实体识别的主动学习方法研究

J Biomed Inform. 2015 Dec;58:11-18. doi: 10.1016/j.jbi.2015.09.010. Epub 2015 Sep 15.

Use of BERT (Bidirectional Encoder Representations from Transformers)-Based Deep Learning Method for Extracting Evidences in Chinese Radiology Reports: Development of a Computer-Aided Liver Cancer Diagnosis Framework.基于 BERT（来自 Transformers 的双向编码器表示）的深度学习方法在提取中文放射学报告证据中的应用：计算机辅助肝癌诊断框架的开发。

J Med Internet Res. 2021 Jan 12;23(1):e19689. doi: 10.2196/19689.

引用本文的文献

Utilizing active learning strategies in machine-assisted annotation for clinical named entity recognition: a comprehensive analysis considering annotation costs and target effectiveness.利用主动学习策略在机器辅助标注中进行临床命名实体识别：考虑标注成本和目标效果的综合分析。

J Am Med Inform Assoc. 2024 Nov 1;31(11):2632-2640. doi: 10.1093/jamia/ocae197.

Named Entity Recognition in Electronic Health Records: A Methodological Review.电子健康记录中的命名实体识别：方法学综述

Healthc Inform Res. 2023 Oct;29(4):286-300. doi: 10.4258/hir.2023.29.4.286. Epub 2023 Oct 31.

Natural language processing in clinical neuroscience and psychiatry: A review.临床神经科学与精神病学中的自然语言处理：综述

Front Psychiatry. 2022 Sep 14;13:946387. doi: 10.3389/fpsyt.2022.946387. eCollection 2022.

Data governance system of the National Clinical Research Center for Child Health in China.中国国家儿童医学中心数据治理体系

Transl Pediatr. 2021 Jul;10(7):1905-1913. doi: 10.21037/tp-21-272.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

对抗式主动学习在医学概念识别和标注不一致性中的应用。

Adversarial active learning for the identification of medical concepts and annotation inconsistency.

机构信息

出版信息

OBJECTIVE

METHODS

RESULTS

CONCLUSION

目的

方法

结果

结论

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献