大语言模型可通过单一提示实现社交媒体语料库的归纳主题分析：人类验证研究。

Large Language Models Can Enable Inductive Thematic Analysis of a Social Media Corpus in a Single Prompt: Human Validation Study.

机构信息

Department of Ophthalmology and Francis I Proctor Foundation, University of California San Francisco, San Francisco, CA, United States.

Center for Vulnerable Populations, Zuckerberg San Francisco General Hospital, Department of Medicine, University of California San Francisco, San Francisco, CA, United States.

出版信息

JMIR Infodemiology. 2024 Aug 29;4:e59641. doi: 10.2196/59641.

DOI:10.2196/59641

PMID:39207842

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11393503/

Abstract

BACKGROUND

Manually analyzing public health-related content from social media provides valuable insights into the beliefs, attitudes, and behaviors of individuals, shedding light on trends and patterns that can inform public understanding, policy decisions, targeted interventions, and communication strategies. Unfortunately, the time and effort needed from well-trained human subject matter experts makes extensive manual social media listening unfeasible. Generative large language models (LLMs) can potentially summarize and interpret large amounts of text, but it is unclear to what extent LLMs can glean subtle health-related meanings in large sets of social media posts and reasonably report health-related themes.

OBJECTIVE

We aimed to assess the feasibility of using LLMs for topic model selection or inductive thematic analysis of large contents of social media posts by attempting to answer the following question: Can LLMs conduct topic model selection and inductive thematic analysis as effectively as humans did in a prior manual study, or at least reasonably, as judged by subject matter experts?

METHODS

We asked the same research question and used the same set of social media content for both the LLM selection of relevant topics and the LLM analysis of themes as was conducted manually in a published study about vaccine rhetoric. We used the results from that study as background for this LLM experiment by comparing the results from the prior manual human analyses with the analyses from 3 LLMs: GPT4-32K, Claude-instant-100K, and Claude-2-100K. We also assessed if multiple LLMs had equivalent ability and assessed the consistency of repeated analysis from each LLM.

RESULTS

The LLMs generally gave high rankings to the topics chosen previously by humans as most relevant. We reject a null hypothesis (P<.001, overall comparison) and conclude that these LLMs are more likely to include the human-rated top 5 content areas in their top rankings than would occur by chance. Regarding theme identification, LLMs identified several themes similar to those identified by humans, with very low hallucination rates. Variability occurred between LLMs and between test runs of an individual LLM. Despite not consistently matching the human-generated themes, subject matter experts found themes generated by the LLMs were still reasonable and relevant.

CONCLUSIONS

LLMs can effectively and efficiently process large social media-based health-related data sets. LLMs can extract themes from such data that human subject matter experts deem reasonable. However, we were unable to show that the LLMs we tested can replicate the depth of analysis from human subject matter experts by consistently extracting the same themes from the same data. There is vast potential, once better validated, for automated LLM-based real-time social listening for common and rare health conditions, informing public health understanding of the public's interests and concerns and determining the public's ideas to address them.

摘要

背景

从社交媒体上手动分析与公共卫生相关的内容，可以深入了解个人的信仰、态度和行为，揭示趋势和模式，从而为公众理解、政策决策、有针对性的干预措施和沟通策略提供信息。不幸的是，需要经过良好训练的人类主题专家投入大量时间和精力，这使得广泛的社交媒体监听变得不可行。生成式大型语言模型（LLM）可以潜在地总结和解释大量文本，但目前尚不清楚 LLM 可以在大量社交媒体帖子中提取出微妙的与健康相关的含义，并合理地报告与健康相关的主题。

目的

我们旨在评估 LLM 用于主题模型选择或对大量社交媒体帖子进行归纳主题分析的可行性，方法是尝试回答以下问题：LLM 能否像人类在先前的手动研究中那样有效地进行主题模型选择和归纳主题分析，或者至少像主题专家判断的那样合理地进行？

方法

我们提出了相同的研究问题，并使用相同的社交媒体内容集，由 LLM 选择相关主题，并由 LLM 分析主题，就像在一项关于疫苗言论的已发表研究中手动进行的那样。我们使用该研究的结果作为本 LLM 实验的背景，通过将先前手动人类分析的结果与 3 个 LLM 的分析结果进行比较：GPT4-32K、Claude-instant-100K 和 Claude-2-100K。我们还评估了多个 LLM 是否具有同等能力，并评估了每个 LLM 的重复分析的一致性。

结果

LLM 通常会对人类先前选择的最相关主题进行高度排序。我们拒绝零假设（P<.001，总体比较），并得出结论，这些 LLM 更有可能将人类评级最高的前 5 个内容领域包含在其最高排名中，而不是随机发生。关于主题识别，LLM 识别出了与人类识别出的几个主题相似的主题，且幻觉率非常低。LLM 之间和单个 LLM 的测试运行之间存在差异。尽管 LLM 生成的主题并未始终与人类生成的主题匹配，但主题专家认为 LLM 生成的主题仍然合理且相关。

结论

LLM 可以有效地、高效地处理基于社交媒体的大型健康相关数据集。LLM 可以从这些数据中提取出主题，这些主题被主题专家认为是合理的。然而，我们无法证明我们测试的 LLM 可以通过从相同的数据中一致地提取相同的主题来复制人类主题专家的深度分析。一旦得到更好的验证，LLM 就具有巨大的潜力，可以实现基于自动化 LLM 的实时社交媒体监听，用于常见和罕见的健康状况，从而增进公众对公众利益和关注的理解，并确定公众解决这些问题的想法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3188/11393503/3d03f6f5eaf9/infodemiology_v4i1e59641_fig1.jpg

相似文献

Large Language Models Can Enable Inductive Thematic Analysis of a Social Media Corpus in a Single Prompt: Human Validation Study.

JMIR Infodemiology. 2024 Aug 29;4:e59641. doi: 10.2196/59641.

Evaluating large language models for health-related text classification tasks with public social media data.

J Am Med Inform Assoc. 2024 Oct 1;31(10):2181-2189. doi: 10.1093/jamia/ocae210.

Quality of Answers of Generative Large Language Models Versus Peer Users for Interpreting Laboratory Test Results for Lay Patients: Evaluation Study.

J Med Internet Res. 2024 Apr 17;26:e56655. doi: 10.2196/56655.

Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.

Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.

Exploring Large Language Models for Detecting Online Vaccine Reactions.

Stud Health Technol Inform. 2024 Sep 24;318:30-35. doi: 10.3233/SHTI240887.

Quality of Answers of Generative Large Language Models vs Peer Patients for Interpreting Lab Test Results for Lay Patients: Evaluation Study.

ArXiv. 2024 Jan 23:arXiv:2402.01693v1.

Optimizing biomedical information retrieval with a keyword frequency-driven prompt enhancement strategy.

BMC Bioinformatics. 2024 Aug 27;25(1):281. doi: 10.1186/s12859-024-05902-7.

Evaluating the Capabilities of Generative AI Tools in Understanding Medical Papers: Qualitative Study.

JMIR Med Inform. 2024 Sep 4;12:e59258. doi: 10.2196/59258.

Assessing the Alignment of Large Language Models With Human Values for Mental Health Integration: Cross-Sectional Study Using Schwartz's Theory of Basic Values.

JMIR Ment Health. 2024 Apr 9;11:e55988. doi: 10.2196/55988.

Inductive thematic analysis of healthcare qualitative interviews using open-source large language models: How does it compare to traditional methods?

Comput Methods Programs Biomed. 2024 Oct;255:108356. doi: 10.1016/j.cmpb.2024.108356. Epub 2024 Jul 24.

引用本文的文献

Large Language Model Synergy for Ensemble Learning in Medical Question Answering: Design and Evaluation Study.

J Med Internet Res. 2025 Jul 14;27:e70080. doi: 10.2196/70080.

Effect of a generative artificial intelligence digital scribe on pediatric provider documentation time, cognitive burden, and burnout.

JAMIA Open. 2025 Jul 3;8(4):ooaf068. doi: 10.1093/jamiaopen/ooaf068. eCollection 2025 Aug.

Use of Large Language Models to Classify Epidemiological Characteristics in Synthetic and Real-World Social Media Posts About Conjunctivitis Outbreaks: Infodemiology Study.

J Med Internet Res. 2025 Jul 2;27:e65226. doi: 10.2196/65226.

Identifying Yalom's group therapeutic factors in anonymous mental health discussions on Reddit: a mixed-methods analysis using large language models, topic modeling and human supervision.

Front Psychiatry. 2025 Jun 9;16:1503427. doi: 10.3389/fpsyt.2025.1503427. eCollection 2025.

Impact of Artificial Intelligence-Generated Content Labels On Perceived Accuracy, Message Credibility, and Sharing Intentions for Misinformation: Web-Based, Randomized, Controlled Experiment.

JMIR Form Res. 2024 Dec 24;8:e60024. doi: 10.2196/60024.

本文引用的文献

Comparing the Perspectives of Generative AI, Mental Health Experts, and the General Public on Schizophrenia Recovery: Case Vignette Study.

JMIR Ment Health. 2024 Mar 18;11:e53043. doi: 10.2196/53043.

Search Engines and Generative Artificial Intelligence Integration: Public Health Risks and Recommendations to Safeguard Consumers Online.

JMIR Public Health Surveill. 2024 Mar 21;10:e53086. doi: 10.2196/53086.

Automated HEART score determination via ChatGPT: Honing a framework for iterative prompt development.

J Am Coll Emerg Physicians Open. 2024 Mar 13;5(2):e13133. doi: 10.1002/emp2.13133. eCollection 2024 Apr.

Use of Large Language Models to Assess the Likelihood of Epidemics From the Content of Tweets: Infodemiology Study.

J Med Internet Res. 2024 Mar 1;26:e49139. doi: 10.2196/49139.

Using Artificial Intelligence to Improve Primary Care for Patients and Clinicians.

JAMA Intern Med. 2024 Apr 1;184(4):343-344. doi: 10.1001/jamainternmed.2023.7965.

A Comparison of ChatGPT and Fine-Tuned Open Pre-Trained Transformers (OPT) Against Widely Used Sentiment Analysis Tools: Sentiment Analysis of COVID-19 Survey Data.

JMIR Ment Health. 2024 Jan 25;11:e50150. doi: 10.2196/50150.

Detecting nuance in conspiracy discourse: Advancing methods in infodemiology and communication science with machine learning and qualitative content coding.

PLoS One. 2023 Dec 20;18(12):e0295414. doi: 10.1371/journal.pone.0295414. eCollection 2023.

Characterization of COVID-19 vaccine clinical trial discussions on the social question-and-answer site Quora.

Trials. 2023 Dec 5;24(1):790. doi: 10.1186/s13063-023-07837-5.

Detection and Characterization of Web-Based Pediatric COVID-19 Vaccine Discussions and Racial and Ethnic Minority Topics: Retrospective Analysis of Twitter Data.

JMIR Pediatr Parent. 2023 Nov 30;6:e48004. doi: 10.2196/48004.

Assessing the Utility of ChatGPT Throughout the Entire Clinical Workflow: Development and Usability Study.

J Med Internet Res. 2023 Aug 22;25:e48659. doi: 10.2196/48659.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

大语言模型可通过单一提示实现社交媒体语料库的归纳主题分析：人类验证研究。

Large Language Models Can Enable Inductive Thematic Analysis of a Social Media Corpus in a Single Prompt: Human Validation Study.

机构信息

Department of Ophthalmology and Francis I Proctor Foundation, University of California San Francisco, San Francisco, CA, United States.

Center for Vulnerable Populations, Zuckerberg San Francisco General Hospital, Department of Medicine, University of California San Francisco, San Francisco, CA, United States.

出版信息

JMIR Infodemiology. 2024 Aug 29;4:e59641. doi: 10.2196/59641.

DOI:10.2196/59641

PMID:39207842

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11393503/

Abstract

BACKGROUND

OBJECTIVE

METHODS

RESULTS

CONCLUSIONS

摘要

大语言模型可通过单一提示实现社交媒体语料库的归纳主题分析：人类验证研究。

Large Language Models Can Enable Inductive Thematic Analysis of a Social Media Corpus in a Single Prompt: Human Validation Study.

机构信息

出版信息

BACKGROUND

OBJECTIVE

METHODS

RESULTS

CONCLUSIONS

背景

目的

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

大语言模型可通过单一提示实现社交媒体语料库的归纳主题分析：人类验证研究。

Large Language Models Can Enable Inductive Thematic Analysis of a Social Media Corpus in a Single Prompt: Human Validation Study.

机构信息

出版信息

BACKGROUND

OBJECTIVE

METHODS

RESULTS

CONCLUSIONS

背景

目的

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献