基于词汇特征的 BiLSTM-CRF 和三训练的中药不良事件报告命名实体识别。

Named entity recognition from Chinese adverse drug event reports with lexical feature based BiLSTM-CRF and tri-training.

机构信息

School of Science, China Pharmaceutical University, Nanjing, China.

Adverse Drug Reaction Monitoring Center of Wuxi, Wuxi, China.

出版信息

J Biomed Inform. 2019 Aug;96:103252. doi: 10.1016/j.jbi.2019.103252. Epub 2019 Jul 16.

DOI:10.1016/j.jbi.2019.103252

PMID:31323311

Abstract

BACKGROUND

The Adverse Drug Event Reports (ADERs) from the spontaneous reporting system are important data sources for studying Adverse Drug Reactions (ADRs) as well as post-marketing pharmacovigilance. Apart from the conventional ADR information contained in the structured section of ADERs, more detailed information such as pre- and post- ADR symptoms, multi-drug usages and ADR-relief treatments are described in the free-text section, which can be mined through Natural Language Processing (NLP) tools.

OBJECTIVE

The goal of this study was to extract ADR-related entities from free-text section of Chinese ADERs, which can act as supplements for the information contained in structured section, so as to further assist in ADR evaluation.

METHODS

Three models of Conditional Random Field (CRF), Bidirectional Long Short-Term Memory-CRF (BiLSTM-CRF) and Lexical Feature based BiLSTM-CRF (LF-BiLSTM-CRF) were constructed to conduct Named Entity Recognition (NER) tasks in free-text section of Chinese ADERs. A semi-supervised learning method of tri-training was applied on the basis of the three established models to give un-annotated raw data with reliable tags.

RESULTS

Among the three basic models, the LF-BiLSTM-CRF achieved the highest average F1 score of 94.35%. After the process of tri-training, almost half of the un-annotated cases were tagged with labels, and the performances of all the three models improved after iterative training.

CONCLUSIONS

The LF-BiLSTM-CRF model that we constructed could achieve a comparatively high F1 score, and the fusion of CRF, while BiLSTM-CRF and LF-BiLSTM-CRF in tri-training might further strengthen the reliability of predicted tags. The results suggested the usefulness of our methods in developing the specialized NER tools for identifying ADR-related information from Chinese ADERs.

摘要

背景

自发报告系统的药物不良反应报告（ADR）是研究药物不良反应（ADR）和上市后药物警戒的重要数据来源。除了 ADR 报告结构化部分中包含的常规 ADR 信息外，自由文本部分还描述了更详细的信息，如 ADR 前后症状、多药物使用和 ADR 缓解治疗，可以通过自然语言处理（NLP）工具进行挖掘。

目的

本研究旨在从中文 ADR 的自由文本部分提取与 ADR 相关的实体，作为结构化部分信息的补充，以进一步协助 ADR 评估。

方法

构建了三种条件随机场（CRF）模型、双向长短时记忆 CRF（BiLSTM-CRF）和基于词汇特征的 BiLSTM-CRF（LF-BiLSTM-CRF），以对中文 ADR 的自由文本部分进行命名实体识别（NER）任务。在这三种建立的模型的基础上，应用三阶段训练的半监督学习方法，为无注释的原始数据提供可靠的标签。

结果

在三种基本模型中，LF-BiLSTM-CRF 实现了 94.35%的平均 F1 得分最高。在三阶段训练过程后，几乎一半的无注释病例都被标记了标签，并且所有三个模型的性能在迭代训练后都有所提高。

结论

我们构建的 LF-BiLSTM-CRF 模型可以达到较高的 F1 得分，而在三阶段训练中融合 CRF、BiLSTM-CRF 和 LF-BiLSTM-CRF 可能会进一步增强预测标签的可靠性。结果表明，我们的方法在开发专门的 NER 工具以识别中文 ADR 中的 ADR 相关信息方面是有用的。

相似文献

Named entity recognition from Chinese adverse drug event reports with lexical feature based BiLSTM-CRF and tri-training.基于词汇特征的 BiLSTM-CRF 和三训练的中药不良事件报告命名实体识别。

J Biomed Inform. 2019 Aug;96:103252. doi: 10.1016/j.jbi.2019.103252. Epub 2019 Jul 16.

Extracting clinical named entity for pituitary adenomas from Chinese electronic medical records.从中文电子病历中提取垂体腺瘤的临床命名实体。

BMC Med Inform Decis Mak. 2022 Mar 23;22(1):72. doi: 10.1186/s12911-022-01810-z.

Chinese-Named Entity Recognition From Adverse Drug Event Records: Radical Embedding-Combined Dynamic Embedding-Based BERT in a Bidirectional Long Short-term Conditional Random Field (Bi-LSTM-CRF) Model.从药品不良事件记录中识别中文命名实体：基于激进嵌入与动态嵌入相结合的BERT的双向长短期条件随机场（Bi-LSTM-CRF）模型

JMIR Med Inform. 2021 Dec 1;9(12):e26407. doi: 10.2196/26407.

Chinese clinical named entity recognition with radical-level feature and self-attention mechanism.基于词干级特征和自注意力机制的中文临床命名实体识别。

J Biomed Inform. 2019 Oct;98:103289. doi: 10.1016/j.jbi.2019.103289. Epub 2019 Sep 18.

Research on named entity recognition of adverse drug reactions based on NLP and deep learning.基于自然语言处理和深度学习的药物不良反应命名实体识别研究

Front Pharmacol. 2023 Jun 1;14:1121796. doi: 10.3389/fphar.2023.1121796. eCollection 2023.

A hybrid approach for named entity recognition in Chinese electronic medical record.中文电子病历命名实体识别的混合方法。

BMC Med Inform Decis Mak. 2019 Apr 9;19(Suppl 2):64. doi: 10.1186/s12911-019-0767-2.

Adversarial active learning for the identification of medical concepts and annotation inconsistency.对抗式主动学习在医学概念识别和标注不一致性中的应用。

J Biomed Inform. 2020 Aug;108:103481. doi: 10.1016/j.jbi.2020.103481. Epub 2020 Jul 18.

An attention-based deep learning model for clinical named entity recognition of Chinese electronic medical records.基于注意力的深度学习模型在中文电子病历临床命名实体识别中的应用。

BMC Med Inform Decis Mak. 2019 Dec 5;19(Suppl 5):235. doi: 10.1186/s12911-019-0933-6.

A fine-grained Chinese word segmentation and part-of-speech tagging corpus for clinical text.一个用于临床文本的细粒度中文分词和词性标注语料库。

BMC Med Inform Decis Mak. 2019 Apr 9;19(Suppl 2):66. doi: 10.1186/s12911-019-0770-7.

Clinical Named Entity Recognition From Chinese Electronic Health Records via Machine Learning Methods.基于机器学习方法的中文电子健康记录临床命名实体识别

JMIR Med Inform. 2018 Dec 17;6(4):e50. doi: 10.2196/medinform.9965.

引用本文的文献

Hybrid natural language processing tool for semantic annotation of medical texts in Spanish.用于西班牙语医学文本语义标注的混合自然语言处理工具。

BMC Bioinformatics. 2025 Jan 8;26(1):7. doi: 10.1186/s12859-024-05949-6.

Application of knowledge graph in smart irrigation district management decision making.知识图谱在智能灌区管理决策中的应用。

Heliyon. 2024 Sep 24;10(19):e38398. doi: 10.1016/j.heliyon.2024.e38398. eCollection 2024 Oct 15.

Named Entity Recognition in Electronic Health Records: A Methodological Review.电子健康记录中的命名实体识别：方法学综述

Healthc Inform Res. 2023 Oct;29(4):286-300. doi: 10.4258/hir.2023.29.4.286. Epub 2023 Oct 31.

A comparison of few-shot and traditional named entity recognition models for medical text.医学文本的少样本与传统命名实体识别模型比较

Proc (IEEE Int Conf Healthc Inform). 2022 Jun;2022:84-89. doi: 10.1109/ichi54592.2022.00024. Epub 2022 Sep 8.

MedLexSp - a medical lexicon for Spanish medical natural language processing.MedLexSp- 西班牙语医学自然语言处理的医学词典。

J Biomed Semantics. 2023 Feb 2;14(1):2. doi: 10.1186/s13326-022-00281-5.

Chemical named entity recognition in the texts of scientific publications using the naïve Bayes classifier approach.使用朴素贝叶斯分类器方法在科学出版物文本中进行化学命名实体识别。

J Cheminform. 2022 Aug 13;14(1):55. doi: 10.1186/s13321-022-00633-4.

Adoption of Dexmedetomidine in Different Doses at Different Timing in Perioperative Patients.围手术期患者不同时间点不同剂量右美托咪定的应用。

Biomed Res Int. 2022 Jul 15;2022:4008941. doi: 10.1155/2022/4008941. eCollection 2022.

JMIR Med Inform. 2021 Dec 1;9(12):e26407. doi: 10.2196/26407.

A Year of Papers Using Biomedical Texts.一年来使用生物医学文本的论文。

Yearb Med Inform. 2020 Aug;29(1):221-225. doi: 10.1055/s-0040-1701997. Epub 2020 Aug 21.

Chinese Emergency Event Recognition Using Conv-RDBiGRU Model.基于 Conv-RDBiGRU 模型的中文应急事件识别

Comput Intell Neurosci. 2020 May 21;2020:7090918. doi: 10.1155/2020/7090918. eCollection 2020.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

基于词汇特征的 BiLSTM-CRF 和三训练的中药不良事件报告命名实体识别。

Named entity recognition from Chinese adverse drug event reports with lexical feature based BiLSTM-CRF and tri-training.

机构信息

出版信息

BACKGROUND

OBJECTIVE

METHODS

RESULTS

CONCLUSIONS

背景

目的

方法

结果

结论

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献