School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, Texas, USA.
Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA.
J Am Med Inform Assoc. 2021 Jul 14;28(7):1393-1400. doi: 10.1093/jamia/ocab014.
Automated analysis of vaccine postmarketing surveillance narrative reports is important to understand the progression of rare but severe vaccine adverse events (AEs). This study implemented and evaluated state-of-the-art deep learning algorithms for named entity recognition to extract nervous system disorder-related events from vaccine safety reports.
We collected Guillain-Barré syndrome (GBS) related influenza vaccine safety reports from the Vaccine Adverse Event Reporting System (VAERS) from 1990 to 2016. VAERS reports were selected and manually annotated with major entities related to nervous system disorders, including, investigation, nervous_AE, other_AE, procedure, social_circumstance, and temporal_expression. A variety of conventional machine learning and deep learning algorithms were then evaluated for the extraction of the above entities. We further pretrained domain-specific BERT (Bidirectional Encoder Representations from Transformers) using VAERS reports (VAERS BERT) and compared its performance with existing models.
Ninety-one VAERS reports were annotated, resulting in 2512 entities. The corpus was made publicly available to promote community efforts on vaccine AEs identification. Deep learning-based methods (eg, bi-long short-term memory and BERT models) outperformed conventional machine learning-based methods (ie, conditional random fields with extensive features). The BioBERT large model achieved the highest exact match F-1 scores on nervous_AE, procedure, social_circumstance, and temporal_expression; while VAERS BERT large models achieved the highest exact match F-1 scores on investigation and other_AE. An ensemble of these 2 models achieved the highest exact match microaveraged F-1 score at 0.6802 and the second highest lenient match microaveraged F-1 score at 0.8078 among peer models.
自动分析疫苗上市后监测叙述报告对于了解罕见但严重的疫苗不良反应(AE)的进展非常重要。本研究实施并评估了最先进的深度学习算法,用于从疫苗安全报告中提取与神经系统疾病相关的事件。
我们从 1990 年至 2016 年从疫苗不良事件报告系统(VAERS)中收集了与格林-巴利综合征(GBS)相关的流感疫苗安全性报告。选择 VAERS 报告并手动注释与神经系统疾病相关的主要实体,包括调查、神经 AE、其他 AE、程序、社会环境和时间表达。然后评估了各种传统的机器学习和深度学习算法,以提取上述实体。我们进一步使用 VAERS 报告对特定于域的 BERT(来自转换器的双向编码器表示)进行预训练(VAERS BERT),并将其性能与现有模型进行比较。
共注释了 91 份 VAERS 报告,产生了 2512 个实体。该语料库已公开提供,以促进社区在疫苗 AE 识别方面的努力。基于深度学习的方法(例如,双向长短时记忆和 BERT 模型)优于基于传统机器学习的方法(即具有广泛特征的条件随机场)。BioBERT 大型模型在神经 AE、程序、社会环境和时间表达方面的精确匹配 F-1 得分最高;而 VAERS BERT 大型模型在调查和其他 AE 方面的精确匹配 F-1 得分最高。这 2 个模型的集成在精确匹配微观平均 F-1 评分方面达到了 0.6802,在宽松匹配微观平均 F-1 评分方面排名第二,在同类模型中达到了 0.8078。