Herbert Wertheim School of Public Health and Human Longevity Science, University of California San Diego, La Jolla, CA, United States.
Qualcomm Institute, University of California San Diego, La Jolla, CA, United States.
J Med Internet Res. 2024 May 2;26:e52499. doi: 10.2196/52499.
This study explores the potential of using large language models to assist content analysis by conducting a case study to identify adverse events (AEs) in social media posts. The case study compares ChatGPT's performance with human annotators' in detecting AEs associated with delta-8-tetrahydrocannabinol, a cannabis-derived product. Using the identical instructions given to human annotators, ChatGPT closely approximated human results, with a high degree of agreement noted: 94.4% (9436/10,000) for any AE detection (Fleiss κ=0.95) and 99.3% (9931/10,000) for serious AEs (κ=0.96). These findings suggest that ChatGPT has the potential to replicate human annotation accurately and efficiently. The study recognizes possible limitations, including concerns about the generalizability due to ChatGPT's training data, and prompts further research with different models, data sources, and content analysis tasks. The study highlights the promise of large language models for enhancing the efficiency of biomedical research.
本研究通过案例研究探索了使用大型语言模型辅助内容分析的潜力,以识别社交媒体帖子中的不良事件 (AE)。该案例研究比较了 ChatGPT 在检测与大麻衍生产品 delta-8-四氢大麻酚相关的 AE 方面的表现与人类注释者的表现。使用给予人类注释者的相同说明,ChatGPT 非常接近人类的结果,一致性程度很高:任何 AE 检测的 94.4%(9436/10000)(Fleiss κ=0.95)和严重 AE 的 99.3%(9931/10000)(κ=0.96)。这些发现表明 ChatGPT 具有准确高效复制人类注释的潜力。该研究认识到可能存在的局限性,包括由于 ChatGPT 的训练数据而导致的普遍性问题,并提出了使用不同模型、数据源和内容分析任务进行进一步研究的建议。该研究强调了大型语言模型在提高生物医学研究效率方面的潜力。