人工智能心理计量学：通过心理计量学量表评估大型语言模型的心理特征。

AI Psychometrics: Assessing the Psychological Profiles of Large Language Models Through Psychometric Inventories.

机构信息

Business School, University of Mannheim.

GESIS-Leibniz Institute for the Social Sciences.

出版信息

Perspect Psychol Sci. 2024 Sep;19(5):808-826. doi: 10.1177/17456916231214460. Epub 2024 Jan 2.

DOI:10.1177/17456916231214460

PMID:38165766

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11373167/

Abstract

We illustrate how standard psychometric inventories originally designed for assessing noncognitive human traits can be repurposed as diagnostic tools to evaluate analogous traits in large language models (LLMs). We start from the assumption that LLMs, inadvertently yet inevitably, acquire psychological traits (metaphorically speaking) from the vast text corpora on which they are trained. Such corpora contain sediments of the personalities, values, beliefs, and biases of the countless human authors of these texts, which LLMs learn through a complex training process. The traits that LLMs acquire in such a way can potentially influence their behavior, that is, their outputs in downstream tasks and applications in which they are employed, which in turn may have real-world consequences for individuals and social groups. By eliciting LLMs' responses to language-based psychometric inventories, we can bring their traits to light. Psychometric profiling enables researchers to study and compare LLMs in terms of noncognitive characteristics, thereby providing a window into the personalities, values, beliefs, and biases these models exhibit (or mimic). We discuss the history of similar ideas and outline possible psychometric approaches for LLMs. We demonstrate one promising approach, zero-shot classification, for several LLMs and psychometric inventories. We conclude by highlighting open challenges and future avenues of research for AI Psychometrics.

摘要

我们说明了如何将原本设计用于评估非认知人类特质的标准心理计量学量表重新用于评估大型语言模型（LLM）中的类似特质的诊断工具。我们的出发点是，LLM 通过其训练的庞大文本语料库，在无意间但不可避免地获得了心理特质（可以这样比喻）。这些语料库包含了无数文本作者的个性、价值观、信仰和偏见的痕迹，LLM 通过复杂的训练过程来学习这些痕迹。LLM 以这种方式获得的特质可能会影响它们的行为，也就是说，它们在下游任务和应用中的输出，这反过来又可能对个人和社会群体产生现实世界的影响。通过引出 LLM 对基于语言的心理计量学量表的反应，我们可以揭示它们的特质。心理计量学分析使研究人员能够根据非认知特征来研究和比较 LLM，从而为这些模型所表现出的个性、价值观、信仰和偏见（或模仿）提供一个窗口。我们讨论了类似想法的历史，并概述了用于 LLM 的可能的心理计量学方法。我们展示了针对几种 LLM 和心理计量学量表的一种很有前途的方法，即零样本分类。最后，我们强调了 AI 心理计量学的开放性挑战和未来研究方向。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/079e/11373167/d42db5d00ced/10.1177_17456916231214460-fig1.jpg

相似文献

AI Psychometrics: Assessing the Psychological Profiles of Large Language Models Through Psychometric Inventories.

Perspect Psychol Sci. 2024 Sep;19(5):808-826. doi: 10.1177/17456916231214460. Epub 2024 Jan 2.

Assessing the Alignment of Large Language Models With Human Values for Mental Health Integration: Cross-Sectional Study Using Schwartz's Theory of Basic Values.

JMIR Ment Health. 2024 Apr 9;11:e55988. doi: 10.2196/55988.

Large Language Models and User Trust: Consequence of Self-Referential Learning Loop and the Deskilling of Health Care Professionals.

J Med Internet Res. 2024 Apr 25;26:e56764. doi: 10.2196/56764.

The Role of Large Language Models in Transforming Emergency Medicine: Scoping Review.

JMIR Med Inform. 2024 May 10;12:e53787. doi: 10.2196/53787.

Evaluating large language models for health-related text classification tasks with public social media data.

J Am Med Inform Assoc. 2024 Oct 1;31(10):2181-2189. doi: 10.1093/jamia/ocae210.

Utility of artificial intelligence-based large language models in ophthalmic care.

Ophthalmic Physiol Opt. 2024 May;44(3):641-671. doi: 10.1111/opo.13284. Epub 2024 Feb 25.

Perils and opportunities in using large language models in psychological research.

PNAS Nexus. 2024 Jul 16;3(7):pgae245. doi: 10.1093/pnasnexus/pgae245. eCollection 2024 Jul.

Exploring large language model for next generation of artificial intelligence in ophthalmology.

Front Med (Lausanne). 2023 Nov 23;10:1291404. doi: 10.3389/fmed.2023.1291404. eCollection 2023.

Learning to Make Rare and Complex Diagnoses With Generative AI Assistance: Qualitative Study of Popular Large Language Models.

JMIR Med Educ. 2024 Feb 13;10:e51391. doi: 10.2196/51391.

Large Language Models in Medicine: The Potentials and Pitfalls : A Narrative Review.

Ann Intern Med. 2024 Feb;177(2):210-220. doi: 10.7326/M23-2772. Epub 2024 Jan 30.

引用本文的文献

How Well Do Simulated Population Samples with GPT-4 Align with Real Ones? The Case of the Eysenck Personality Questionnaire Revised-Abbreviated Personality Test.

Health Data Sci. 2025 Jul 2;5:0284. doi: 10.34133/hds.0284. eCollection 2025.

Improvement of metaphor understanding via a cognitive linguistic model based on hierarchical classification and artificial intelligence SVM.

Sci Rep. 2025 May 29;15(1):18947. doi: 10.1038/s41598-025-04171-5.

Kernels of selfhood: GPT-4o shows humanlike patterns of cognitive dissonance moderated by free choice.

Proc Natl Acad Sci U S A. 2025 May 20;122(20):e2501823122. doi: 10.1073/pnas.2501823122. Epub 2025 May 14.

AI can outperform humans in predicting correlations between personality items.

Commun Psychol. 2025 Feb 12;3(1):23. doi: 10.1038/s44271-025-00205-w.

Personality testing of large language models: limited temporal stability, but highlighted prosociality.

R Soc Open Sci. 2024 Oct 9;11(10):240180. doi: 10.1098/rsos.240180. eCollection 2024 Oct.

Can Generative AI improve social science?

Proc Natl Acad Sci U S A. 2024 May 21;121(21):e2314021121. doi: 10.1073/pnas.2314021121. Epub 2024 May 9.

本文引用的文献

The debate over understanding in AI's large language models.

Proc Natl Acad Sci U S A. 2023 Mar 28;120(13):e2215907120. doi: 10.1073/pnas.2215907120. Epub 2023 Mar 21.

Using cognitive psychology to understand GPT-3.

Proc Natl Acad Sci U S A. 2023 Feb 7;120(6):e2218523120. doi: 10.1073/pnas.2218523120. Epub 2023 Feb 2.

Propagation of societal gender inequality by internet search algorithms.

Proc Natl Acad Sci U S A. 2022 Jul 19;119(29):e2204529119. doi: 10.1073/pnas.2204529119. Epub 2022 Jul 12.

Measuring the Refined Theory of Individual Values in 49 Cultural Groups: Psychometrics of the Revised Portrait Value Questionnaire.

Assessment. 2022 Jul;29(5):1005-1019. doi: 10.1177/1073191121998760. Epub 2021 Mar 6.

Performance vs. competence in human-machine comparisons.

Proc Natl Acad Sci U S A. 2020 Oct 27;117(43):26562-26571. doi: 10.1073/pnas.1905334117. Epub 2020 Oct 13.

Machine behaviour.

Nature. 2019 Apr;568(7753):477-486. doi: 10.1038/s41586-019-1138-y. Epub 2019 Apr 24.

Development of the Gender Role Attitudes Scale (GRAS) amongst young Spanish people.

Int J Clin Health Psychol. 2015 Jan-Apr;15(1):61-68. doi: 10.1016/j.ijchp.2014.10.004. Epub 2014 Nov 27.

The future of sex and gender in psychology: Five challenges to the gender binary.

Am Psychol. 2019 Feb-Mar;74(2):171-193. doi: 10.1037/amp0000307. Epub 2018 Jul 19.

Semantics derived automatically from language corpora contain human-like biases.

Science. 2017 Apr 14;356(6334):183-186. doi: 10.1126/science.aal4230.

Refining the theory of basic individual values.

J Pers Soc Psychol. 2012 Oct;103(4):663-88. doi: 10.1037/a0029393. Epub 2012 Jul 23.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

人工智能心理计量学：通过心理计量学量表评估大型语言模型的心理特征。

AI Psychometrics: Assessing the Psychological Profiles of Large Language Models Through Psychometric Inventories.

机构信息

Business School, University of Mannheim.

GESIS-Leibniz Institute for the Social Sciences.

出版信息

Perspect Psychol Sci. 2024 Sep;19(5):808-826. doi: 10.1177/17456916231214460. Epub 2024 Jan 2.

DOI:10.1177/17456916231214460

PMID:38165766

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11373167/

Abstract

摘要

人工智能心理计量学：通过心理计量学量表评估大型语言模型的心理特征。

AI Psychometrics: Assessing the Psychological Profiles of Large Language Models Through Psychometric Inventories.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

人工智能心理计量学：通过心理计量学量表评估大型语言模型的心理特征。

AI Psychometrics: Assessing the Psychological Profiles of Large Language Models Through Psychometric Inventories.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献