The Psychology Department, Max Stern Yezreel Valley College, Tel Adashim, Israel.
The Jane Goodall Institute, Max Stern Yezreel Valley College, Tel Adashim, Israel.
JMIR Ment Health. 2024 Apr 9;11:e55988. doi: 10.2196/55988.
BACKGROUND: Large language models (LLMs) hold potential for mental health applications. However, their opaque alignment processes may embed biases that shape problematic perspectives. Evaluating the values embedded within LLMs that guide their decision-making have ethical importance. Schwartz's theory of basic values (STBV) provides a framework for quantifying cultural value orientations and has shown utility for examining values in mental health contexts, including cultural, diagnostic, and therapist-client dynamics. OBJECTIVE: This study aimed to (1) evaluate whether the STBV can measure value-like constructs within leading LLMs and (2) determine whether LLMs exhibit distinct value-like patterns from humans and each other. METHODS: In total, 4 LLMs (Bard, Claude 2, Generative Pretrained Transformer [GPT]-3.5, GPT-4) were anthropomorphized and instructed to complete the Portrait Values Questionnaire-Revised (PVQ-RR) to assess value-like constructs. Their responses over 10 trials were analyzed for reliability and validity. To benchmark the LLMs' value profiles, their results were compared to published data from a diverse sample of 53,472 individuals across 49 nations who had completed the PVQ-RR. This allowed us to assess whether the LLMs diverged from established human value patterns across cultural groups. Value profiles were also compared between models via statistical tests. RESULTS: The PVQ-RR showed good reliability and validity for quantifying value-like infrastructure within the LLMs. However, substantial divergence emerged between the LLMs' value profiles and population data. The models lacked consensus and exhibited distinct motivational biases, reflecting opaque alignment processes. For example, all models prioritized universalism and self-direction, while de-emphasizing achievement, power, and security relative to humans. Successful discriminant analysis differentiated the 4 LLMs' distinct value profiles. Further examination found the biased value profiles strongly predicted the LLMs' responses when presented with mental health dilemmas requiring choosing between opposing values. This provided further validation for the models embedding distinct motivational value-like constructs that shape their decision-making. CONCLUSIONS: This study leveraged the STBV to map the motivational value-like infrastructure underpinning leading LLMs. Although the study demonstrated the STBV can effectively characterize value-like infrastructure within LLMs, substantial divergence from human values raises ethical concerns about aligning these models with mental health applications. The biases toward certain cultural value sets pose risks if integrated without proper safeguards. For example, prioritizing universalism could promote unconditional acceptance even when clinically unwise. Furthermore, the differences between the LLMs underscore the need to standardize alignment processes to capture true cultural diversity. Thus, any responsible integration of LLMs into mental health care must account for their embedded biases and motivation mismatches to ensure equitable delivery across diverse populations. Achieving this will require transparency and refinement of alignment techniques to instill comprehensive human values.
背景:大型语言模型(LLMs)在心理健康应用方面具有潜力。然而,它们不透明的对齐过程可能会嵌入偏见,从而形成有问题的观点。评估指导其决策的 LLM 中嵌入的价值观具有伦理重要性。施瓦茨的基本价值观理论(STBV)提供了一种量化文化价值取向的框架,并已显示出在心理健康背景下评估价值观的效用,包括文化、诊断和治疗师-客户动态。 目的:本研究旨在(1)评估 STBV 是否可以衡量领先的 LLM 中的类似价值观的结构,以及(2)确定 LLM 是否表现出与人类和彼此不同的类似价值观的模式。 方法:总共有 4 个 LLM(Bard、Claude 2、生成式预训练转换器[GPT]-3.5、GPT-4)被拟人化,并被指示完成人格价值观问卷修订版(PVQ-RR),以评估类似价值观的结构。他们在 10 次试验中的反应被分析了可靠性和有效性。为了基准测试 LLM 的价值概况,将他们的结果与来自 49 个国家的 53472 名不同个体的发表数据进行了比较,这些人完成了 PVQ-RR。这使我们能够评估 LLM 是否在文化群体中偏离了既定的人类价值模式。还通过统计检验比较了模型之间的价值概况。 结果:PVQ-RR 显示出很好的可靠性和有效性,可以量化 LLM 中的类似价值观结构。然而,在 LLM 的价值概况和人群数据之间出现了实质性的分歧。模型之间缺乏共识,表现出不同的动机偏见,反映出不透明的对齐过程。例如,所有模型都优先考虑普遍性和自我导向,而相对于人类则轻视成就、权力和安全。成功的判别分析区分了 4 个 LLM 的不同价值概况。进一步的检查发现,有偏见的价值概况强烈预测了 LLM 在面临需要在对立价值观之间做出选择的心理健康困境时的反应。这进一步验证了模型中嵌入的独特动机类似价值观结构,这些结构塑造了它们的决策。 结论:本研究利用 STBV 来映射支撑主要 LLM 的动机类似价值观结构。尽管该研究表明 STBV 可以有效地描述 LLM 中的类似价值观结构,但与人类价值观的实质性分歧引起了与心理健康应用对齐这些模型的伦理关注。对某些文化价值体系的偏见如果没有适当的保护措施而被整合,就会带来风险。例如,优先考虑普遍性可能会促进无条件接受,即使在临床上并不明智。此外,LLM 之间的差异强调了需要标准化对齐过程以捕捉真正的文化多样性。因此,任何负责任地将 LLM 整合到心理健康护理中都必须考虑到其嵌入的偏见和动机不匹配,以确保在不同人群中公平地提供服务。要实现这一目标,需要透明度和对齐技术的改进,以灌输全面的人类价值观。
JMIR Med Inform. 2024-5-10
Front Psychiatry. 2025-8-6
JMIR Ment Health. 2025-1-20
JMIR Ment Health. 2024-10-17
Turk Psikiyatri Derg. 2024-10-14
Fam Med Community Health. 2024-1-9
Fam Med Community Health. 2023-9
Am J Bioeth. 2023-10
JMIR Ment Health. 2023-9-20
Front Psychiatry. 2023-9-1
Front Psychiatry. 2023-8-1