Wu Hong, Ji Jiatong, Tian Haimei, Chen Yao, Ge Weihong, Zhang Haixia, Yu Feng, Zou Jianjun, Nakamura Mitsuhiro, Liao Jun
School of Science, China Pharmaceutical University, Nanjing, China.
School of Basic Medicine and Clinical Pharmacy, China Pharmaceutical University, Nanjing, China.
JMIR Med Inform. 2021 Dec 1;9(12):e26407. doi: 10.2196/26407.
With the increasing variety of drugs, the incidence of adverse drug events (ADEs) is increasing year by year. Massive numbers of ADEs are recorded in electronic medical records and adverse drug reaction (ADR) reports, which are important sources of potential ADR information. Meanwhile, it is essential to make latent ADR information automatically available for better postmarketing drug safety reevaluation and pharmacovigilance.
This study describes how to identify ADR-related information from Chinese ADE reports.
Our study established an efficient automated tool, named BBC-Radical. BBC-Radical is a model that consists of 3 components: Bidirectional Encoder Representations from Transformers (BERT), bidirectional long short-term memory (bi-LSTM), and conditional random field (CRF). The model identifies ADR-related information from Chinese ADR reports. Token features and radical features of Chinese characters were used to represent the common meaning of a group of words. BERT and Bi-LSTM-CRF were novel models that combined these features to conduct named entity recognition (NER) tasks in the free-text section of 24,890 ADR reports from the Jiangsu Province Adverse Drug Reaction Monitoring Center from 2010 to 2016. Moreover, the man-machine comparison experiment on the ADE records from Drum Tower Hospital was designed to compare the NER performance between the BBC-Radical model and a manual method.
The NER model achieved relatively high performance, with a precision of 96.4%, recall of 96.0%, and F1 score of 96.2%. This indicates that the performance of the BBC-Radical model (precision 87.2%, recall 85.7%, and F1 score 86.4%) is much better than that of the manual method (precision 86.1%, recall 73.8%, and F1 score 79.5%) in the recognition task of each kind of entity.
The proposed model was competitive in extracting ADR-related information from ADE reports, and the results suggest that the application of our method to extract ADR-related information is of great significance in improving the quality of ADR reports and postmarketing drug safety evaluation.
随着药物种类的不断增加,药物不良事件(ADEs)的发生率逐年上升。大量的ADEs记录在电子病历和药物不良反应(ADR)报告中,这些是潜在ADR信息的重要来源。同时,使潜在的ADR信息自动可用对于更好地进行上市后药物安全性再评估和药物警戒至关重要。
本研究描述了如何从中文ADE报告中识别与ADR相关的信息。
我们的研究建立了一个高效的自动化工具,名为BBC-Radical。BBC-Radical是一个由3个组件组成的模型:来自变换器的双向编码器表示(BERT)、双向长短期记忆(bi-LSTM)和条件随机场(CRF)。该模型从中文ADR报告中识别与ADR相关的信息。汉字的词元特征和部首特征被用来表示一组词的共同含义。BERT和Bi-LSTM-CRF是新颖的模型,它们结合这些特征在2010年至2016年江苏省药品不良反应监测中心的24890份ADR报告的自由文本部分进行命名实体识别(NER)任务。此外,设计了对鼓楼医院ADE记录的人机对比实验,以比较BBC-Radical模型和人工方法之间的NER性能。
NER模型取得了相对较高的性能,精确率为96.4%,召回率为96.0%,F1分数为96.2%。这表明在各类实体的识别任务中,BBC-Radical模型(精确率87.2%,召回率85.7%,F1分数86.4%)的性能比人工方法(精确率86.1%,召回率73.8%,F1分数79.5%)要好得多。
所提出的模型在从ADE报告中提取与ADR相关的信息方面具有竞争力,结果表明我们的方法在提取与ADR相关的信息方面的应用对于提高ADR报告的质量和上市后药物安全性评估具有重要意义。