使用大语言模型检测和理解基于网络论坛中的药物停用事件：开发与验证研究

Using Large Language Models to Detect and Understand Drug Discontinuation Events in Web-Based Forums: Development and Validation Study.

作者信息

Trevena William, Zhong Xiang, Alvarado Michelle, Semenov Alexander, Oktay Alp, Devlin Devin, Gohil Aarya Yogesh, Chittimouju Sai Harsha

机构信息

Department of Industrial and Systems Engineering, The University of Florida, GAINESVILLE, FL, United States.

Department of Industrial and Systems Engineering, The University of San Diego, San Diego, CA, United States.

出版信息

J Med Internet Res. 2025 Jan 30;27:e54601. doi: 10.2196/54601.

DOI:10.2196/54601

PMID:39883487

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11826943/

Abstract

BACKGROUND

The implementation of large language models (LLMs), such as BART (Bidirectional and Auto-Regressive Transformers) and GPT-4, has revolutionized the extraction of insights from unstructured text. These advancements have expanded into health care, allowing analysis of social media for public health insights. However, the detection of drug discontinuation events (DDEs) remains underexplored. Identifying DDEs is crucial for understanding medication adherence and patient outcomes.

OBJECTIVE

The aim of this study is to provide a flexible framework for investigating various clinical research questions in data-sparse environments. We provide an example of the utility of this framework by identifying DDEs and their root causes in an open-source web-based forum, MedHelp, and by releasing the first open-source DDE datasets to aid further research in this domain.

METHODS

We used several LLMs, including GPT-4 Turbo, GPT-4o, DeBERTa (Decoding-Enhanced Bidirectional Encoder Representations from Transformer with Disentangled Attention), and BART, among others, to detect and determine the root causes of DDEs in user comments posted on MedHelp. Our study design included the use of zero-shot classification, which allows these models to make predictions without task-specific training. We split user comments into sentences and applied different classification strategies to assess the performance of these models in identifying DDEs and their root causes.

RESULTS

Among the selected models, GPT-4o performed the best at determining the root causes of DDEs, predicting only 12.9% of root causes incorrectly (hamming loss). Among the open-source models tested, BART demonstrated the best performance in detecting DDEs, achieving an F-score of 0.86, a false positive rate of 2.8%, and a false negative rate of 6.5%, all without any fine-tuning. The dataset included 10.7% (107/1000) DDEs, emphasizing the models' robustness in an imbalanced data context.

CONCLUSIONS

This study demonstrated the effectiveness of open- and closed-source LLMs, such as GPT-4o and BART, for detecting DDEs and their root causes from publicly accessible data through zero-shot classification. The robust and scalable framework we propose can aid researchers in addressing data-sparse clinical research questions. The launch of open-access DDE datasets has the potential to stimulate further research and novel discoveries in this field.

摘要

背景

诸如BART（双向自回归变换器）和GPT-4等大语言模型的应用，彻底改变了从非结构化文本中提取见解的方式。这些进展已扩展到医疗保健领域，使得通过分析社交媒体来获取公共卫生见解成为可能。然而，药物停用事件（DDEs）的检测仍未得到充分探索。识别DDEs对于理解药物依从性和患者预后至关重要。

目的

本研究的目的是提供一个灵活的框架，用于在数据稀疏的环境中研究各种临床研究问题。我们通过在一个基于网络的开源论坛MedHelp中识别DDEs及其根本原因，并发布首个开源DDE数据集以帮助该领域的进一步研究，来展示这个框架的实用性。

方法

我们使用了多个大语言模型，包括GPT-4 Turbo、GPT-4o、DeBERTa（具有解缠注意力的基于变换器的解码增强双向编码器表示）和BART等，来检测并确定MedHelp上用户评论中的DDEs的根本原因。我们的研究设计包括使用零样本分类，这使得这些模型无需特定任务训练就能进行预测。我们将用户评论拆分成句子，并应用不同的分类策略来评估这些模型在识别DDEs及其根本原因方面的性能。

结果

在所选模型中，GPT-4o在确定DDEs的根本原因方面表现最佳，仅12.9%的根本原因预测错误（汉明损失）。在测试的开源模型中，BART在检测DDEs方面表现最佳，F值达到0.86，误报率为2.8%，漏报率为6.5%，且均未进行任何微调。该数据集包含10.7%（107/1000）的DDEs，强调了模型在不平衡数据情况下的稳健性。