Suppr超能文献

基于词汇特征的 BiLSTM-CRF 和三训练的中药不良事件报告命名实体识别。

Named entity recognition from Chinese adverse drug event reports with lexical feature based BiLSTM-CRF and tri-training.

机构信息

School of Science, China Pharmaceutical University, Nanjing, China.

Adverse Drug Reaction Monitoring Center of Wuxi, Wuxi, China.

出版信息

J Biomed Inform. 2019 Aug;96:103252. doi: 10.1016/j.jbi.2019.103252. Epub 2019 Jul 16.

Abstract

BACKGROUND

The Adverse Drug Event Reports (ADERs) from the spontaneous reporting system are important data sources for studying Adverse Drug Reactions (ADRs) as well as post-marketing pharmacovigilance. Apart from the conventional ADR information contained in the structured section of ADERs, more detailed information such as pre- and post- ADR symptoms, multi-drug usages and ADR-relief treatments are described in the free-text section, which can be mined through Natural Language Processing (NLP) tools.

OBJECTIVE

The goal of this study was to extract ADR-related entities from free-text section of Chinese ADERs, which can act as supplements for the information contained in structured section, so as to further assist in ADR evaluation.

METHODS

Three models of Conditional Random Field (CRF), Bidirectional Long Short-Term Memory-CRF (BiLSTM-CRF) and Lexical Feature based BiLSTM-CRF (LF-BiLSTM-CRF) were constructed to conduct Named Entity Recognition (NER) tasks in free-text section of Chinese ADERs. A semi-supervised learning method of tri-training was applied on the basis of the three established models to give un-annotated raw data with reliable tags.

RESULTS

Among the three basic models, the LF-BiLSTM-CRF achieved the highest average F1 score of 94.35%. After the process of tri-training, almost half of the un-annotated cases were tagged with labels, and the performances of all the three models improved after iterative training.

CONCLUSIONS

The LF-BiLSTM-CRF model that we constructed could achieve a comparatively high F1 score, and the fusion of CRF, while BiLSTM-CRF and LF-BiLSTM-CRF in tri-training might further strengthen the reliability of predicted tags. The results suggested the usefulness of our methods in developing the specialized NER tools for identifying ADR-related information from Chinese ADERs.

摘要

背景

自发报告系统的药物不良反应报告(ADR)是研究药物不良反应(ADR)和上市后药物警戒的重要数据来源。除了 ADR 报告结构化部分中包含的常规 ADR 信息外,自由文本部分还描述了更详细的信息,如 ADR 前后症状、多药物使用和 ADR 缓解治疗,可以通过自然语言处理(NLP)工具进行挖掘。

目的

本研究旨在从中文 ADR 的自由文本部分提取与 ADR 相关的实体,作为结构化部分信息的补充,以进一步协助 ADR 评估。

方法

构建了三种条件随机场(CRF)模型、双向长短时记忆 CRF(BiLSTM-CRF)和基于词汇特征的 BiLSTM-CRF(LF-BiLSTM-CRF),以对中文 ADR 的自由文本部分进行命名实体识别(NER)任务。在这三种建立的模型的基础上,应用三阶段训练的半监督学习方法,为无注释的原始数据提供可靠的标签。

结果

在三种基本模型中,LF-BiLSTM-CRF 实现了 94.35%的平均 F1 得分最高。在三阶段训练过程后,几乎一半的无注释病例都被标记了标签,并且所有三个模型的性能在迭代训练后都有所提高。

结论

我们构建的 LF-BiLSTM-CRF 模型可以达到较高的 F1 得分,而在三阶段训练中融合 CRF、BiLSTM-CRF 和 LF-BiLSTM-CRF 可能会进一步增强预测标签的可靠性。结果表明,我们的方法在开发专门的 NER 工具以识别中文 ADR 中的 ADR 相关信息方面是有用的。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验