基于文本和药物嵌入的多模态模型用于药物不良反应分类。

Multimodal model with text and drug embeddings for adverse drug reaction classification.

机构信息

Kazan Federal University, 18 Kremlyovskaya street, Kazan, 420008, Russian Federation; Lomonosov Moscow State University, 1 Leninskie gory, Moscow, 119991, Russian Federation.

Kazan Federal University, 18 Kremlyovskaya street, Kazan, 420008, Russian Federation; Sber AI, 19 Vavilova St., Moscow, 117997, Russian Federation; National Research University Higher School of Economics, 11 Pokrovsky Bulvar, Moscow, 109028, Russian Federation.

出版信息

J Biomed Inform. 2022 Nov;135:104182. doi: 10.1016/j.jbi.2022.104182. Epub 2022 Sep 30.

Abstract

In this paper, we focus on the classification of tweets as sources of potential signals for adverse drug effects (ADEs) or drug reactions (ADRs). Following the intuition that text and drug structure representations are complementary, we introduce a multimodal model with two components. These components are state-of-the-art BERT-based models for language understanding and molecular property prediction. Experiments were carried out on multilingual benchmarks of the Social Media Mining for Health Research and Applications (#SMM4H) initiative. Our models obtained state-of-the-art results of 0.61 F-measure and 0.57 F-measure on #SMM4H 2021 Shared Tasks 1a and 2 in English and Russian, respectively. On the classification of French tweets from SMM4H 2020 Task 1, our approach pushes the state of the art by an absolute gain of 8% F. Our experiments show that the molecular information obtained from neural networks is more beneficial for ADE classification than traditional molecular descriptors. The source code for our models is freely available at https://github.com/Andoree/smm4h_2021_classification.

摘要

本文专注于对潜在药物不良反应(ADE)或药物反应(ADR)信号源的推文进行分类。受文本和药物结构表示相辅相成的直觉启发,我们引入了一种具有两个组件的多模态模型。这些组件是基于 BERT 的语言理解和分子性质预测的最先进模型。实验在 Social Media Mining for Health Research and Applications (#SMM4H) 计划的多语言基准上进行。我们的模型在 #SMM4H 2021 共享任务 1a 和 2 中分别获得了 0.61 F 测度和 0.57 F 测度的最新结果,这两种语言分别为英语和俄语。在 SMM4H 2020 任务 1 的法语推文分类中,我们的方法通过绝对增益 8% F 推动了该领域的发展。我们的实验表明,从神经网络获得的分子信息对 ADE 分类比传统分子描述符更有利。我们模型的源代码可在 https://github.com/Andoree/smm4h_2021_classification 上免费获取。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索