Wegner Philipp, Fröhlich Holger, Madan Sumit
Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), Sankt Augustin, Germany.
German Center for Neurodegenerative Diseases (DZNE), Bonn, Germany.
PLOS Digit Health. 2025 Mar 18;4(3):e0000468. doi: 10.1371/journal.pdig.0000468. eCollection 2025 Mar.
Detecting adverse drug events (ADE) of drugs that are already available on the market is an essential part of the pharmacovigilance work conducted by both medical regulatory bodies and the pharmaceutical industry. Concerns regarding drug safety and economic interests serve as motivating factors for the efforts to identify ADEs. Hereby, social media platforms play an important role as a valuable source of reports on ADEs, particularly through collecting posts discussing adverse events associated with specific drugs. We aim with our study to assess the effectiveness of knowledge fusion approaches in combination with transformer-based NLP models to extract ADE mentions from diverse datasets, for instance, texts from Twitter, websites like askapatient.com, and drug labels. The extraction task is formulated as a named entity recognition (NER) problem. The proposed methodology involves applying fusion learning methods to enhance the performance of transformer-based language models with additional contextual knowledge from ontologies or knowledge graphs. Additionally, the study introduces a multi-modal architecture that combines transformer-based language models with graph attention networks (GAT) to identify ADE spans in textual data. A multi-modality model consisting of the ERNIE model with knowledge on drugs reached an F1-score of 71.84% on CADEC corpus. Additionally, a combination of a graph attention network with BERT resulted in an F1-score of 65.16% on SMM4H corpus. Impressively, the same model achieved an F1-score of 72.50% on the PsyTAR corpus, 79.54% on the ADE corpus, and 94.15% on the TAC corpus. Except for the CADEC corpus, the knowledge fusion models consistently outperformed the baseline model, BERT. Our study demonstrates the significance of context knowledge in improving the performance of knowledge fusion models for detecting ADEs from various types of textual data.
检测已上市药物的药物不良事件(ADE)是医疗监管机构和制药行业开展药物警戒工作的重要组成部分。对药物安全性和经济利益的关注是识别药物不良事件工作的推动因素。在此,社交媒体平台作为药物不良事件报告的宝贵来源发挥着重要作用,特别是通过收集讨论与特定药物相关不良事件的帖子。我们的研究旨在评估知识融合方法与基于Transformer的自然语言处理(NLP)模型相结合,从不同数据集(如Twitter文本、askapatient.com等网站文本以及药品标签)中提取药物不良事件提及的有效性。提取任务被表述为一个命名实体识别(NER)问题。所提出的方法包括应用融合学习方法,利用来自本体或知识图谱的额外上下文知识来提高基于Transformer的语言模型的性能。此外,该研究引入了一种多模态架构,将基于Transformer的语言模型与图注意力网络(GAT)相结合,以识别文本数据中的药物不良事件跨度。一个包含具有药物知识的ERNIE模型的多模态模型在CADEC语料库上的F1分数达到了71.84%。此外,图注意力网络与BERT的组合在SMM4H语料库上的F1分数为65.16%。令人印象深刻的是,同一模型在PsyTAR语料库上的F1分数为72.50%,在ADE语料库上为79.54%,在TAC语料库上为94.15%。除了CADEC语料库外,知识融合模型始终优于基线模型BERT。我们的研究证明了上下文知识在提高从各种类型文本数据中检测药物不良事件的知识融合模型性能方面的重要性。