一项比较人类主导和ChatGPT驱动的医学教育研究定性分析的混合方法研究。

A mixed-methods study comparing human-led and ChatGPT-driven qualitative analysis in medical education research.

作者信息

Kondo Takeshi, Miyachi Junichiro, Jönsson Anders, Nishigori Hiroshi

机构信息

Department of General Medicine/Family & Community Medicine, Nagoya University Graduate School of Medicine, Nagoya, Japan.

Center for Medical Education, Nagoya University Graduate School of Medicine, Nagoya, Japan.

出版信息

Nagoya J Med Sci. 2024 Nov;86(4):620-644. doi: 10.18999/nagjms.86.4.620.

DOI:10.18999/nagjms.86.4.620

PMID:39780933

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11704766/

Abstract

Qualitative research, used to analyse non-numerical data including interview texts, is crucial in understanding medical education processes. However, it is often complex and time-consuming, leading to an interest in technology for streamlining the analysis. This study investigated the applicability of ChatGPT, a large language model, in thematic analysis for medical qualitative research. Previous research has used ChatGPT to explore the deductive process as a qualitative study. This study evaluated thematic analysis including the inductive process by ChatGPT with reference to human qualitative analysis. A convergent design mixed-methods study was used. Using a thematic analysis approach, ChatGPT (model: GPT-4) analysed some interview data from a previously published medical research article. The assessors evaluated the qualitative analysis of ChatGPT using human qualitative analysis as a benchmark. Three assessors compared the human-conducted and ChatGPT-driven qualitative analyses. ChatGPT scored higher in most aspects but showed variable transferability and mixed depth scores. In the integrated analysis including qualitative data, six themes were identified: superficial similarity of results with human analysis, good first impression, explicit association with data and process, contamination by directions in prompts, deficiency of thick descriptions based on context and research questions, and lack of theoretical derivation. ChatGPT excels at extracting key data points and summarising information; however, it is prone to prompt contamination, which necessitates careful scrutiny. To achieve deeper analysis, it is essential to supplement the research context with human input and explore the theoretical framework.

摘要

定性研究用于分析包括访谈文本在内的非数值数据，在理解医学教育过程中至关重要。然而，它往往复杂且耗时，这引发了人们对用于简化分析的技术的兴趣。本研究调查了大型语言模型ChatGPT在医学定性研究主题分析中的适用性。此前的研究已使用ChatGPT探索作为定性研究的演绎过程。本研究参照人类定性分析评估了ChatGPT包括归纳过程在内的主题分析。采用了收敛设计混合方法研究。使用主题分析方法，ChatGPT（模型：GPT-4）分析了一篇先前发表的医学研究文章中的一些访谈数据。评估人员以人类定性分析为基准评估ChatGPT的定性分析。三名评估人员比较了人工进行的和ChatGPT驱动的定性分析。ChatGPT在大多数方面得分较高，但显示出可变的可转移性和混合深度得分。在包括定性数据的综合分析中，确定了六个主题：与人类分析结果的表面相似性、良好的第一印象、与数据和过程的明确关联、提示中的方向造成的污染、基于背景和研究问题的详细描述不足以及缺乏理论推导。ChatGPT擅长提取关键数据点和总结信息；然而，它容易受到提示污染的影响，这需要仔细审查。为了进行更深入的分析，必须用人为输入补充研究背景并探索理论框架。