在心理学研究中使用大语言模型的风险与机遇

Perils and opportunities in using large language models in psychological research.

作者信息

Abdurahman Suhaib, Atari Mohammad, Karimi-Malekabadi Farzan, Xue Mona J, Trager Jackson, Park Peter S, Golazizian Preni, Omrani Ali, Dehghani Morteza

机构信息

Department of Psychology, University of Southern California, Los Angeles, CA 90089, USA.

Brain and Creativity Institute, University of Southern California, Los Angeles, CA 90089, USA.

出版信息

PNAS Nexus. 2024 Jul 16;3(7):pgae245. doi: 10.1093/pnasnexus/pgae245. eCollection 2024 Jul.

DOI:10.1093/pnasnexus/pgae245

PMID:39015547

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11249969/

Abstract

The emergence of large language models (LLMs) has sparked considerable interest in their potential application in psychological research, mainly as a model of the human psyche or as a general text-analysis tool. However, the trend of using LLMs without sufficient attention to their limitations and risks, which we rhetorically refer to as "GPTology", can be detrimental given the easy access to models such as ChatGPT. Beyond existing general guidelines, we investigate the current limitations, ethical implications, and potential of LLMs specifically for psychological research, and show their concrete impact in various empirical studies. Our results highlight the importance of recognizing global psychological diversity, cautioning against treating LLMs (especially in zero-shot settings) as universal solutions for text analysis, and developing transparent, open methods to address LLMs' opaque nature for reliable, reproducible, and robust inference from AI-generated data. Acknowledging LLMs' utility for task automation, such as text annotation, or to expand our understanding of human psychology, we argue for diversifying human samples and expanding psychology's methodological toolbox to promote an inclusive, generalizable science, countering homogenization, and over-reliance on LLMs.

摘要

大语言模型（LLMs）的出现引发了人们对其在心理学研究中潜在应用的浓厚兴趣，主要是将其作为人类心理的模型或通用文本分析工具。然而，在使用大语言模型时，如果没有充分关注其局限性和风险（我们将这种现象戏称为“GPT学”），鉴于ChatGPT等模型易于获取，可能会产生不利影响。除了现有的一般指导方针外，我们专门针对心理学研究，探讨了大语言模型当前的局限性、伦理影响和潜力，并展示了它们在各种实证研究中的具体影响。我们的结果强调了认识全球心理多样性的重要性，告诫不要将大语言模型（特别是在零样本设置中）视为文本分析的通用解决方案，并开发透明、开放的方法来应对大语言模型的不透明性，以便从人工智能生成的数据中进行可靠、可重复和稳健的推断。认识到大语言模型在任务自动化（如文本注释）方面的效用，或有助于扩展我们对人类心理的理解，我们主张使人类样本多样化，并扩大心理学的方法工具箱，以促进一门包容、可推广的科学，对抗同质化和对大语言模型的过度依赖。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/32e2/11249969/c7da62dc27c0/pgae245f1.jpg

相似文献

Perils and opportunities in using large language models in psychological research.在心理学研究中使用大语言模型的风险与机遇

PNAS Nexus. 2024 Jul 16;3(7):pgae245. doi: 10.1093/pnasnexus/pgae245. eCollection 2024 Jul.

The Role of Large Language Models in Transforming Emergency Medicine: Scoping Review.大型语言模型在变革急诊医学中的作用：范围综述

JMIR Med Inform. 2024 May 10;12:e53787. doi: 10.2196/53787.

Assessing the Alignment of Large Language Models With Human Values for Mental Health Integration: Cross-Sectional Study Using Schwartz's Theory of Basic Values.评估大型语言模型与人类心理健康整合价值观的一致性：使用施瓦茨基本价值观理论的横断面研究。

JMIR Ment Health. 2024 Apr 9;11:e55988. doi: 10.2196/55988.

Utility of Large Language Models for Health Care Professionals and Patients in Navigating Hematopoietic Stem Cell Transplantation: Comparison of the Performance of ChatGPT-3.5, ChatGPT-4, and Bard.大型语言模型在造血干细胞移植导航中对医疗保健专业人员和患者的实用性：ChatGPT-3.5、ChatGPT-4 和 Bard 的性能比较。

J Med Internet Res. 2024 May 17;26:e54758. doi: 10.2196/54758.

Assessing the research landscape and clinical utility of large language models: a scoping review.评估大型语言模型的研究现状和临床实用性：范围综述。

BMC Med Inform Decis Mak. 2024 Mar 12;24(1):72. doi: 10.1186/s12911-024-02459-6.

AI Psychometrics: Assessing the Psychological Profiles of Large Language Models Through Psychometric Inventories.人工智能心理计量学：通过心理计量学量表评估大型语言模型的心理特征。

Perspect Psychol Sci. 2024 Sep;19(5):808-826. doi: 10.1177/17456916231214460. Epub 2024 Jan 2.

Evaluation of the Performance of Generative AI Large Language Models ChatGPT, Google Bard, and Microsoft Bing Chat in Supporting Evidence-Based Dentistry: Comparative Mixed Methods Study.评估生成式 AI 大语言模型 ChatGPT、Google Bard 和 Microsoft Bing Chat 在支持循证牙科方面的性能：比较混合方法研究。

J Med Internet Res. 2023 Dec 28;25:e51580. doi: 10.2196/51580.

Generative artificial intelligence in healthcare from the perspective of digital media: Applications, opportunities and challenges.从数字媒体视角看医疗保健领域的生成式人工智能：应用、机遇与挑战

Heliyon. 2024 Jun 5;10(12):e32364. doi: 10.1016/j.heliyon.2024.e32364. eCollection 2024 Jun 30.

Large Language Models and User Trust: Consequence of Self-Referential Learning Loop and the Deskilling of Health Care Professionals.大语言模型与用户信任：自我参照学习循环的后果及医疗保健专业人员的技能退化

J Med Internet Res. 2024 Apr 25;26:e56764. doi: 10.2196/56764.

Utility of artificial intelligence-based large language models in ophthalmic care.人工智能大型语言模型在眼科护理中的应用。

Ophthalmic Physiol Opt. 2024 May;44(3):641-671. doi: 10.1111/opo.13284. Epub 2024 Feb 25.

引用本文的文献

Quantifying and explaining the rise of fiction.量化并解释小说的兴起。

Evol Hum Sci. 2025 Jul 14;7:e20. doi: 10.1017/ehs.2025.10011. eCollection 2025.

Social media perceptions of college football performance and season length 2019-2023.2019 - 2023年社交媒体对大学橄榄球表现和赛季时长的认知

PLoS One. 2025 Jul 1;20(7):e0325840. doi: 10.1371/journal.pone.0325840. eCollection 2025.

Using Natural Language Processing to Track Negative Emotions in the Daily Lives of Adolescents.利用自然语言处理技术追踪青少年日常生活中的负面情绪。

Res Sq. 2025 Apr 17:rs.3.rs-6414400. doi: 10.21203/rs.3.rs-6414400/v1.

Evaluation of Six Large Language Models for Clinical Decision Support: Application in Transfusion Decision-making for RhD Blood-type Patients.六种用于临床决策支持的大语言模型评估：在RhD血型患者输血决策中的应用

Ann Lab Med. 2025 Sep 1;45(5):520-529. doi: 10.3343/alm.2024.0588. Epub 2025 Apr 28.

New opportunities and challenges for conservation evidence synthesis from advances in natural language processing.自然语言处理进展给保护证据综合带来的新机遇与挑战。

Conserv Biol. 2025 Apr;39(2):e14464. doi: 10.1111/cobi.14464.

Measuring gender and racial biases in large language models: Intersectional evidence from automated resume evaluation.衡量大语言模型中的性别和种族偏见：来自自动化简历评估的交叉性证据。

PNAS Nexus. 2025 Mar 12;4(3):pgaf089. doi: 10.1093/pnasnexus/pgaf089. eCollection 2025 Mar.

Wisdom of the silicon crowd: LLM ensemble prediction capabilities rival human crowd accuracy.硅基智慧群体的智慧：大型语言模型集成预测能力可媲美人类群体的准确性。

Sci Adv. 2024 Nov 8;10(45):eadp1528. doi: 10.1126/sciadv.adp1528.

From discovery to innovation in physiological research.从生理研究的发现到创新。

Exp Physiol. 2025 Mar;110(3):355-357. doi: 10.1113/EP092125. Epub 2024 Oct 28.

Cultural bias and cultural alignment of large language models.大语言模型的文化偏见与文化契合度

PNAS Nexus. 2024 Sep 17;3(9):pgae346. doi: 10.1093/pnasnexus/pgae346. eCollection 2024 Sep.

GPT is an effective tool for multilingual psychological text analysis.GPT 是一种用于多语言心理文本分析的有效工具。

Proc Natl Acad Sci U S A. 2024 Aug 20;121(34):e2308950121. doi: 10.1073/pnas.2308950121. Epub 2024 Aug 12.

本文引用的文献

Can Generative AI improve social science?生成式人工智能能改进社会科学吗？

Proc Natl Acad Sci U S A. 2024 May 21;121(21):e2314021121. doi: 10.1073/pnas.2314021121. Epub 2024 May 9.

Artificial intelligence and illusions of understanding in scientific research.人工智能与科研中的理解错觉。

Nature. 2024 Mar;627(8002):49-58. doi: 10.1038/s41586-024-07146-0. Epub 2024 Mar 6.

Can generative AI infer thinking style from language? Evaluating the utility of AI as a psychological text analysis tool.生成式 AI 能否从语言推断思维风格？评估 AI 作为心理文本分析工具的效用。

Behav Res Methods. 2024 Apr;56(4):3548-3559. doi: 10.3758/s13428-024-02344-0. Epub 2024 Jan 26.

Diminished diversity-of-thought in a standard large language model.标准大语言模型中思想多样性的降低。

Behav Res Methods. 2024 Sep;56(6):5754-5770. doi: 10.3758/s13428-023-02307-x. Epub 2024 Jan 9.

Morality beyond the WEIRD: How the nomological network of morality varies across cultures.超越西方、受过良好教育、工业化、富裕和民主（WEIRD）人群的道德：道德的法则网络如何在不同文化中变化。

J Pers Soc Psychol. 2023 Nov;125(5):1157-1188. doi: 10.1037/pspp0000470. Epub 2023 Aug 17.

Emergent analogical reasoning in large language models.大语言模型中的紧急类比推理。

Nat Hum Behav. 2023 Sep;7(9):1526-1541. doi: 10.1038/s41562-023-01659-w. Epub 2023 Jul 31.

Large language models encode clinical knowledge.大语言模型编码临床知识。

Nature. 2023 Aug;620(7972):172-180. doi: 10.1038/s41586-023-06291-2. Epub 2023 Jul 12.

AI and the transformation of social science research.人工智能与社会科学研究的变革。

Science. 2023 Jun 16;380(6650):1108-1109. doi: 10.1126/science.adi1778. Epub 2023 Jun 15.

Can AI language models replace human participants?人工智能语言模型能否替代人类参与者？

Trends Cogn Sci. 2023 Jul;27(7):597-600. doi: 10.1016/j.tics.2023.04.008. Epub 2023 May 10.

Why open-source generative AI models are an ethical way forward for science.为何开源生成式人工智能模型是科学发展的一种道德途径。

Nature. 2023 Apr;616(7957):413. doi: 10.1038/d41586-023-01295-4.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

在心理学研究中使用大语言模型的风险与机遇

Perils and opportunities in using large language models in psychological research.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献