Deng Yu, Xing Yunzhao, Quach Jason, Chen Xiaotian, Wu Xiaoqiang, Zhang Yafei, Moureaud Charlotte, Yu Mengjia, Zhao Yujie, Wang Li, Zhong Sheng
Data & Statistical Sciences, AbbVie Inc, North Chicago, Illinois, USA.
Computer Science & Engineering, University of California San Diego, La Jolla, California, USA.
J Biopharm Stat. 2024 Sep 20:1-12. doi: 10.1080/10543406.2024.2403442.
Adverse drug events (ADEs) are one of the major causes of hospital admissions and are associated with increased morbidity and mortality. Post-marketing ADE identification is one of the most important phases of drug safety surveillance. Traditionally, data sources for post-marketing surveillance mainly come from spontaneous reporting system such as the Food and Drug Administration Adverse Event Reporting System (FAERS). Social media data such as posts on X (formerly Twitter) contain rich patient and medication information and could potentially accelerate drug surveillance research. However, ADE information in social media data is usually locked in the text, making it difficult to be employed by traditional statistical approaches. In recent years, large language models (LLMs) have shown promise in many natural language processing tasks. In this study, we developed several LLMs to perform ADE classification on X data. We fine-tuned various LLMs including BERT-base, Bio_ClinicalBERT, RoBERTa, and RoBERTa-large. We also experimented ChatGPT few-shot prompting and ChatGPT fine-tuned on the whole training data. We then evaluated the model performance based on sensitivity, specificity, negative predictive value, positive predictive value, accuracy, F1-measure, and area under the ROC curve. Our results showed that RoBERTa-large achieved the best F1-measure (0.8) among all models followed by ChatGPT fine-tuned model with F1-measure of 0.75. Our feature importance analysis based on 1200 random samples and RoBERTa-Large showed the most important features are as follows: "withdrawals"/"withdrawal", "dry", "dealing", "mouth", and "paralysis". The good model performance and clinically relevant features show the potential of LLMs in augmenting ADE detection for post-marketing drug safety surveillance.
药物不良事件(ADEs)是导致住院的主要原因之一,且与发病率和死亡率的增加相关。上市后ADE的识别是药物安全监测最重要的阶段之一。传统上,上市后监测的数据来源主要来自自发报告系统,如美国食品药品监督管理局不良事件报告系统(FAERS)。社交媒体数据,如X(前身为Twitter)上的帖子,包含丰富的患者和用药信息,可能会加速药物监测研究。然而,社交媒体数据中的ADE信息通常隐藏在文本中,传统统计方法难以利用。近年来,大语言模型(LLMs)在许多自然语言处理任务中显示出前景。在本研究中,我们开发了几个大语言模型来对X数据进行ADE分类。我们对包括BERT-base、Bio_ClinicalBERT、RoBERTa和RoBERTa-large在内的各种大语言模型进行了微调。我们还试验了ChatGPT的少样本提示以及在整个训练数据上进行微调的ChatGPT。然后,我们基于敏感性、特异性、阴性预测值、阳性预测值、准确性、F1分数和ROC曲线下面积评估了模型性能。我们的结果表明,RoBERTa-large在所有模型中实现了最佳的F1分数(0.8),其次是微调后的ChatGPT模型,F1分数为0.75。我们基于1200个随机样本和RoBERTa-Large的特征重要性分析表明,最重要的特征如下:“停药”/“撤药”、“干燥”、“处理”、“口腔”和“麻痹”。良好的模型性能和临床相关特征表明大语言模型在加强上市后药物安全监测的ADE检测方面具有潜力。