Department of Biomedical Informatics & Medical Education, University of Washington, Seattle, WA, USA.
Department of Computer Science, Rice University, Houston, TX, USA.
J Biomed Inform. 2021 Jul;119:103833. doi: 10.1016/j.jbi.2021.103833. Epub 2021 Jun 8.
Adverse Drug Events (ADEs) are prevalent, costly, and sometimes preventable. Post-marketing drug surveillance aims to monitor ADEs that occur after a drug is released to market. Reports of such ADEs are aggregated by reporting systems, such as the Food and Drug Administration (FDA) Adverse Event Reporting System (FAERS). In this paper, we consider the topic of how best to represent data derived from reports in FAERS for the purpose of detecting post-marketing surveillance signals, in order to inform regulatory decision making. In our previous work, we developed aer2vec, a method for deriving distributed representations (concept embeddings) of drugs and side effects from ADE reports, establishing the utility of distributional information for pharmacovigilance signal detection. In this paper, we advance this line of research further by evaluating the utility of encoding orthographic and lexical information. We do so by adapting two Natural Language Processing methods, subword embedding and vector retrofitting, which were developed to encode such information into word embeddings. Models were compared for their ability to distinguish between positive and negative examples in a set of manually curated drug/ADE relationships, with both aer2vec enhancements offering advantages in performances over baseline models, and best performance obtained when retrofitting and subword embeddings were applied in concert. In addition, this work demonstrates that models leveraging distributed representations do not require extensive manual preprocessing to perform well on this pharmacovigilance signal detection task, and may even benefit from information that would otherwise be lost during the normalization and standardization process.
药物不良反应(ADE)普遍存在、代价高昂,有时甚至可以预防。上市后药物监测旨在监测药物上市后发生的 ADE。此类 ADE 的报告由报告系统汇总,例如美国食品和药物管理局(FDA)不良事件报告系统(FAERS)。在本文中,我们考虑了如何最好地表示从 FAERS 报告中得出的数据,以便检测上市后监测信号,从而为监管决策提供信息。在我们之前的工作中,我们开发了 aer2vec,这是一种从 ADE 报告中提取药物和副作用分布式表示(概念嵌入)的方法,为药物警戒信号检测建立了分布信息的实用性。在本文中,我们通过评估编码正字法和词汇信息的效用进一步推进了这一研究。我们通过适应两种自然语言处理方法,子词嵌入和向量重构,来实现这一点,这两种方法旨在将此类信息编码到词嵌入中。我们比较了模型在一组手动策划的药物/ADE 关系中区分正例和负例的能力,与基线模型相比,aer2vec 的增强版本在性能上都具有优势,而在同时应用重构和子词嵌入时获得了最佳性能。此外,这项工作表明,利用分布式表示的模型不需要进行大量的手动预处理即可在这项药物警戒信号检测任务中表现良好,甚至可能受益于在规范化和标准化过程中丢失的信息。