Silverman Anna L, Sushil Madhumita, Bhasuran Balu, Ludwig Dana, Buchanan James, Racz Rebecca, Parakala Mahalakshmi, El-Kamary Samer, Ahima Ohenewaa, Belov Artur, Choi Lauren, Billings Monisha, Li Yan, Habal Nadia, Liu Qi, Tiwari Jawahar, Butte Atul J, Rudrapatna Vivek A
medRxiv. 2023 Sep 8:2023.09.06.23295149. doi: 10.1101/2023.09.06.23295149.
Outpatient clinical notes are a rich source of information regarding drug safety. However, data in these notes are currently underutilized for pharmacovigilance due to methodological limitations in text mining. Large language models (LLM) like BERT have shown progress in a range of natural language processing tasks but have not yet been evaluated on adverse event detection.
We adapted a new clinical LLM, UCSF BERT, to identify serious adverse events (SAEs) occurring after treatment with a non-steroid immunosuppressant for inflammatory bowel disease (IBD). We compared this model to other language models that have previously been applied to AE detection.
We annotated 928 outpatient IBD notes corresponding to 928 individual IBD patients for all SAE-associated hospitalizations occurring after treatment with a non-steroid immunosuppressant. These notes contained 703 SAEs in total, the most common of which was failure of intended efficacy. Out of 8 candidate models, UCSF BERT achieved the highest numerical performance on identifying drug-SAE pairs from this corpus (accuracy 88-92%, macro F1 61-68%), with 5-10% greater accuracy than previously published models. UCSF BERT was significantly superior at identifying hospitalization events emergent to medication use (p < 0.01).
LLMs like UCSF BERT achieve numerically superior accuracy on the challenging task of SAE detection from clinical notes compared to prior methods. Future work is needed to adapt this methodology to improve model performance and evaluation using multi-center data and newer architectures like GPT. Our findings support the potential value of using large language models to enhance pharmacovigilance.
门诊临床记录是有关药物安全性的丰富信息来源。然而,由于文本挖掘方法的局限性,这些记录中的数据目前在药物警戒中未得到充分利用。像BERT这样的大语言模型(LLM)在一系列自然语言处理任务中已取得进展,但尚未在不良事件检测方面进行评估。
我们采用了一种新的临床大语言模型UCSF BERT,以识别炎症性肠病(IBD)患者使用非甾体类免疫抑制剂治疗后发生的严重不良事件(SAE)。我们将该模型与先前应用于不良事件检测的其他语言模型进行了比较。
我们对928份门诊IBD记录进行了注释,这些记录对应于928名个体IBD患者,涉及使用非甾体类免疫抑制剂治疗后发生的所有与SAE相关的住院情况。这些记录总共包含703例SAE,其中最常见的是预期疗效失败。在8个候选模型中,UCSF BERT在此语料库中识别药物-SAE对方面取得了最高的数值性能(准确率88-92%,宏F1值61-68%),比先前发表的模型准确率高5-10%。UCSF BERT在识别用药后出现的住院事件方面明显更具优势(p<0.01)。
与先前方法相比,像UCSF BERT这样的大语言模型在从临床记录中检测SAE这一具有挑战性的任务上取得了数值上更高的准确率。需要开展进一步工作来调整此方法,以使用多中心数据和像GPT这样的更新架构来提高模型性能和评估。我们的研究结果支持使用大语言模型增强药物警戒的潜在价值。