生成式人工智能在孕产妇健康研究中的主题分析：使用大语言模型对半结构化访谈进行编码

Generative AI for thematic analysis in a maternal health study: coding semistructured interviews using large language models.

作者信息

Qiao Shan, Fang Xingyu, Wang Junbo, Zhang Ran, Li Xiaoming, Kang Yuhao

机构信息

Department of Health Promotion, Education, and Behavior, Arnold School of Public Health, University of South Carolina, Columbia, South Carolina, USA.

South Carolina SmartState Center for Heatlhcare Quality (CHQ), University of South Carolina, Columbia, South Carolina, USA.

出版信息

Appl Psychol Health Well Being. 2025 Jun;17(3):e70038. doi: 10.1111/aphw.70038.

DOI:10.1111/aphw.70038

PMID:40377231

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12083056/

Abstract

STUDY OBJECTIVES

The coding of semistructured interview transcripts is a critical step for thematic analysis of qualitative data. However, the coding process is often labor-intensive and time-consuming. The emergence of generative artificial intelligence (GenAI) presents new opportunities to enhance the efficiency of qualitative coding. This study proposed a computational pipeline using GenAI to automatically extract themes from interview transcripts.

METHODS

Using transcripts from interviews conducted with maternity care providers in South Carolina, we leveraged ChatGPT for inductive coding to generate codes from interview transcripts without a predetermined coding scheme. Structured prompts were designed to instruct ChatGPT to generate and summarize codes. The performance of GenAI was evaluated by comparing the AI-generated codes with those generated manually.

RESULTS

GenAI demonstrated promise in detecting and summarizing codes from interview transcripts. ChatGPT exhibited an overall accuracy exceeding 80% in inductive coding. More impressively, GenAI reduced the time required for coding by 81%.

DISCUSSION

GenAI models are capable of efficiently processing language datasets and performing multi-level semantic identification. However, challenges such as inaccuracy, systematic biases, and privacy concerns must be acknowledged and addressed. Future research should focus on refining these models to enhance reliability and address inherent limitations associated with their application in qualitative research.

摘要