文献检索文档翻译深度研究
Suppr Zotero 插件Zotero 插件
邀请有礼套餐&价格历史记录

新学期,新优惠

限时优惠:9月1日-9月22日

30天高级会员仅需29元

1天体验卡首发特惠仅需5.99元

了解详情
不再提醒
插件&应用
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
高级版
套餐订阅购买积分包
AI 工具
文献检索文档翻译深度研究
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2025

评估大型语言模型与人类心理健康整合价值观的一致性:使用施瓦茨基本价值观理论的横断面研究。

Assessing the Alignment of Large Language Models With Human Values for Mental Health Integration: Cross-Sectional Study Using Schwartz's Theory of Basic Values.

机构信息

The Psychology Department, Max Stern Yezreel Valley College, Tel Adashim, Israel.

The Jane Goodall Institute, Max Stern Yezreel Valley College, Tel Adashim, Israel.

出版信息

JMIR Ment Health. 2024 Apr 9;11:e55988. doi: 10.2196/55988.


DOI:10.2196/55988
PMID:38593424
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11040439/
Abstract

BACKGROUND: Large language models (LLMs) hold potential for mental health applications. However, their opaque alignment processes may embed biases that shape problematic perspectives. Evaluating the values embedded within LLMs that guide their decision-making have ethical importance. Schwartz's theory of basic values (STBV) provides a framework for quantifying cultural value orientations and has shown utility for examining values in mental health contexts, including cultural, diagnostic, and therapist-client dynamics. OBJECTIVE: This study aimed to (1) evaluate whether the STBV can measure value-like constructs within leading LLMs and (2) determine whether LLMs exhibit distinct value-like patterns from humans and each other. METHODS: In total, 4 LLMs (Bard, Claude 2, Generative Pretrained Transformer [GPT]-3.5, GPT-4) were anthropomorphized and instructed to complete the Portrait Values Questionnaire-Revised (PVQ-RR) to assess value-like constructs. Their responses over 10 trials were analyzed for reliability and validity. To benchmark the LLMs' value profiles, their results were compared to published data from a diverse sample of 53,472 individuals across 49 nations who had completed the PVQ-RR. This allowed us to assess whether the LLMs diverged from established human value patterns across cultural groups. Value profiles were also compared between models via statistical tests. RESULTS: The PVQ-RR showed good reliability and validity for quantifying value-like infrastructure within the LLMs. However, substantial divergence emerged between the LLMs' value profiles and population data. The models lacked consensus and exhibited distinct motivational biases, reflecting opaque alignment processes. For example, all models prioritized universalism and self-direction, while de-emphasizing achievement, power, and security relative to humans. Successful discriminant analysis differentiated the 4 LLMs' distinct value profiles. Further examination found the biased value profiles strongly predicted the LLMs' responses when presented with mental health dilemmas requiring choosing between opposing values. This provided further validation for the models embedding distinct motivational value-like constructs that shape their decision-making. CONCLUSIONS: This study leveraged the STBV to map the motivational value-like infrastructure underpinning leading LLMs. Although the study demonstrated the STBV can effectively characterize value-like infrastructure within LLMs, substantial divergence from human values raises ethical concerns about aligning these models with mental health applications. The biases toward certain cultural value sets pose risks if integrated without proper safeguards. For example, prioritizing universalism could promote unconditional acceptance even when clinically unwise. Furthermore, the differences between the LLMs underscore the need to standardize alignment processes to capture true cultural diversity. Thus, any responsible integration of LLMs into mental health care must account for their embedded biases and motivation mismatches to ensure equitable delivery across diverse populations. Achieving this will require transparency and refinement of alignment techniques to instill comprehensive human values.

摘要

背景:大型语言模型(LLMs)在心理健康应用方面具有潜力。然而,它们不透明的对齐过程可能会嵌入偏见,从而形成有问题的观点。评估指导其决策的 LLM 中嵌入的价值观具有伦理重要性。施瓦茨的基本价值观理论(STBV)提供了一种量化文化价值取向的框架,并已显示出在心理健康背景下评估价值观的效用,包括文化、诊断和治疗师-客户动态。 目的:本研究旨在(1)评估 STBV 是否可以衡量领先的 LLM 中的类似价值观的结构,以及(2)确定 LLM 是否表现出与人类和彼此不同的类似价值观的模式。 方法:总共有 4 个 LLM(Bard、Claude 2、生成式预训练转换器[GPT]-3.5、GPT-4)被拟人化,并被指示完成人格价值观问卷修订版(PVQ-RR),以评估类似价值观的结构。他们在 10 次试验中的反应被分析了可靠性和有效性。为了基准测试 LLM 的价值概况,将他们的结果与来自 49 个国家的 53472 名不同个体的发表数据进行了比较,这些人完成了 PVQ-RR。这使我们能够评估 LLM 是否在文化群体中偏离了既定的人类价值模式。还通过统计检验比较了模型之间的价值概况。 结果:PVQ-RR 显示出很好的可靠性和有效性,可以量化 LLM 中的类似价值观结构。然而,在 LLM 的价值概况和人群数据之间出现了实质性的分歧。模型之间缺乏共识,表现出不同的动机偏见,反映出不透明的对齐过程。例如,所有模型都优先考虑普遍性和自我导向,而相对于人类则轻视成就、权力和安全。成功的判别分析区分了 4 个 LLM 的不同价值概况。进一步的检查发现,有偏见的价值概况强烈预测了 LLM 在面临需要在对立价值观之间做出选择的心理健康困境时的反应。这进一步验证了模型中嵌入的独特动机类似价值观结构,这些结构塑造了它们的决策。 结论:本研究利用 STBV 来映射支撑主要 LLM 的动机类似价值观结构。尽管该研究表明 STBV 可以有效地描述 LLM 中的类似价值观结构,但与人类价值观的实质性分歧引起了与心理健康应用对齐这些模型的伦理关注。对某些文化价值体系的偏见如果没有适当的保护措施而被整合,就会带来风险。例如,优先考虑普遍性可能会促进无条件接受,即使在临床上并不明智。此外,LLM 之间的差异强调了需要标准化对齐过程以捕捉真正的文化多样性。因此,任何负责任地将 LLM 整合到心理健康护理中都必须考虑到其嵌入的偏见和动机不匹配,以确保在不同人群中公平地提供服务。要实现这一目标,需要透明度和对齐技术的改进,以灌输全面的人类价值观。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8b97/11040439/1c0d5acae690/mental_v11i1e55988_fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8b97/11040439/4ffbf4b8d2ef/mental_v11i1e55988_fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8b97/11040439/c02899347b5b/mental_v11i1e55988_fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8b97/11040439/1c0d5acae690/mental_v11i1e55988_fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8b97/11040439/4ffbf4b8d2ef/mental_v11i1e55988_fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8b97/11040439/c02899347b5b/mental_v11i1e55988_fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8b97/11040439/1c0d5acae690/mental_v11i1e55988_fig3.jpg

相似文献

[1]
Assessing the Alignment of Large Language Models With Human Values for Mental Health Integration: Cross-Sectional Study Using Schwartz's Theory of Basic Values.

JMIR Ment Health. 2024-4-9

[2]
Embedded values-like shape ethical reasoning of large language models on primary care ethical dilemmas.

Heliyon. 2024-9-19

[3]
The Role of Large Language Models in Transforming Emergency Medicine: Scoping Review.

JMIR Med Inform. 2024-5-10

[4]
Evaluating the Capabilities of Generative AI Tools in Understanding Medical Papers: Qualitative Study.

JMIR Med Inform. 2024-9-4

[5]
Comparing the Perspectives of Generative AI, Mental Health Experts, and the General Public on Schizophrenia Recovery: Case Vignette Study.

JMIR Ment Health. 2024-3-18

[6]
Evaluation of the Performance of Generative AI Large Language Models ChatGPT, Google Bard, and Microsoft Bing Chat in Supporting Evidence-Based Dentistry: Comparative Mixed Methods Study.

J Med Internet Res. 2023-12-28

[7]
Evaluating Large Language Models for the National Premedical Exam in India: Comparative Analysis of GPT-3.5, GPT-4, and Bard.

JMIR Med Educ. 2024-2-21

[8]
Current safeguards, risk mitigation, and transparency measures of large language models against the generation of health disinformation: repeated cross sectional analysis.

BMJ. 2024-3-20

[9]
Capacity of Generative AI to Interpret Human Emotions From Visual and Textual Data: Pilot Evaluation Study.

JMIR Ment Health. 2024-2-6

[10]
Learning to Make Rare and Complex Diagnoses With Generative AI Assistance: Qualitative Study of Popular Large Language Models.

JMIR Med Educ. 2024-2-13

引用本文的文献

[1]
Evaluation of large language models on mental health: from knowledge test to illness diagnosis.

Front Psychiatry. 2025-8-6

[2]
A controlled trial examining large Language model conformity in psychiatric assessment using the Asch paradigm.

BMC Psychiatry. 2025-5-12

[3]
The Feasibility of Large Language Models in Verbal Comprehension Assessment: Mixed Methods Feasibility Study.

JMIR Form Res. 2025-2-24

[4]
The externalization of internal experiences in psychotherapy through generative artificial intelligence: a theoretical, clinical, and ethical analysis.

Front Digit Health. 2025-2-4

[5]
Responsible Design, Integration, and Use of Generative AI in Mental Health.

JMIR Ment Health. 2025-1-20

[6]
An Ethical Perspective on the Democratization of Mental Health With Generative AI.

JMIR Ment Health. 2024-10-17

[7]
The use of Artificial Intelligence in Psychotherapy: Practical and Ethical Aspects.

Turk Psikiyatri Derg. 2024-10-14

[8]
Embedded values-like shape ethical reasoning of large language models on primary care ethical dilemmas.

Heliyon. 2024-9-19

[9]
The impact of history of depression and access to weapons on suicide risk assessment: a comparison of ChatGPT-3.5 and ChatGPT-4.

PeerJ. 2024

[10]
The Artificial Third: A Broad View of the Effects of Introducing Generative Artificial Intelligence on Psychotherapy.

JMIR Ment Health. 2024-5-23

本文引用的文献

[1]
The Artificial Third: A Broad View of the Effects of Introducing Generative Artificial Intelligence on Psychotherapy.

JMIR Ment Health. 2024-5-23

[2]
Capacity of Generative AI to Interpret Human Emotions From Visual and Textual Data: Pilot Evaluation Study.

JMIR Ment Health. 2024-2-6

[3]
Beyond Personhood: Ethical Paradigms in the Generative Artificial Intelligence Era.

Am J Bioeth. 2024-1

[4]
Assessing prognosis in depression: comparing perspectives of AI models, mental health professionals and the general public.

Fam Med Community Health. 2024-1-9

[5]
Identifying depression and its determinants upon initiating treatment: ChatGPT versus primary care physicians.

Fam Med Community Health. 2023-9

[6]
The Artificial Third: Utilizing ChatGPT in Mental Health.

Am J Bioeth. 2023-10

[7]
Suicide Risk Assessments Through the Eyes of ChatGPT-3.5 Versus ChatGPT-4: Vignette Study.

JMIR Ment Health. 2023-9-20

[8]
The plasticity of ChatGPT's mentalizing abilities: personalization for personality structures.

Front Psychiatry. 2023-9-1

[9]
Beyond human expertise: the promise and limitations of ChatGPT in suicide risk assessment.

Front Psychiatry. 2023-8-1

[10]
Waiting for a digital therapist: three challenges on the path to psychotherapy delivered by artificial intelligence.

Front Psychiatry. 2023-6-1

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

推荐工具

医学文档翻译智能文献检索