Chopard Daphne, Treder Matthias S, Corcoran Padraig, Ahmed Nagheen, Johnson Claire, Busse Monica, Spasic Irena
School of Computer Science & Informatics, Cardiff University, Cardiff, United Kingdom.
Centre for Trials Research, Cardiff University, Cardiff, United Kingdom.
JMIR Med Inform. 2021 Dec 24;9(12):e28632. doi: 10.2196/28632.
Pharmacovigilance and safety reporting, which involve processes for monitoring the use of medicines in clinical trials, play a critical role in the identification of previously unrecognized adverse events or changes in the patterns of adverse events.
This study aims to demonstrate the feasibility of automating the coding of adverse events described in the narrative section of the serious adverse event report forms to enable statistical analysis of the aforementioned patterns.
We used the Unified Medical Language System (UMLS) as the coding scheme, which integrates 217 source vocabularies, thus enabling coding against other relevant terminologies such as the International Classification of Diseases-10th Revision, Medical Dictionary for Regulatory Activities, and Systematized Nomenclature of Medicine). We used MetaMap, a highly configurable dictionary lookup software, to identify the mentions of the UMLS concepts. We trained a binary classifier using Bidirectional Encoder Representations from Transformers (BERT), a transformer-based language model that captures contextual relationships, to differentiate between mentions of the UMLS concepts that represented adverse events and those that did not.
The model achieved a high F1 score of 0.8080, despite the class imbalance. This is 10.15 percent points lower than human-like performance but also 17.45 percent points higher than that of the baseline approach.
These results confirmed that automated coding of adverse events described in the narrative section of serious adverse event reports is feasible. Once coded, adverse events can be statistically analyzed so that any correlations with the trialed medicines can be estimated in a timely fashion.
药物警戒和安全性报告涉及在临床试验中监测药物使用的过程,在识别先前未被认识的不良事件或不良事件模式变化方面发挥着关键作用。
本研究旨在证明对严重不良事件报告表叙述部分中描述的不良事件进行编码自动化的可行性,以便对上述模式进行统计分析。
我们使用统一医学语言系统(UMLS)作为编码方案,该系统整合了217种源词汇表,从而能够对照其他相关术语进行编码,如《国际疾病分类-第十次修订本》、《药物监管活动医学词典》和《医学系统命名法》。我们使用MetaMap,一种高度可配置的词典查找软件,来识别UMLS概念的提及。我们使用基于变换器的双向编码器表示(BERT)训练了一个二元分类器,BERT是一种捕获上下文关系的基于变换器的语言模型,用于区分代表不良事件的UMLS概念提及和不代表不良事件的提及。
尽管存在类别不平衡,该模型仍获得了0.8080的高F1分数。这比类似人类的表现低10.15个百分点,但也比基线方法高17.45个百分点。
这些结果证实了对严重不良事件报告叙述部分中描述的不良事件进行编码自动化是可行的。一旦编码,不良事件就可以进行统计分析,以便及时估计与受试药物的任何相关性。