• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

嵌入价值观塑造大型语言模型在初级保健伦理困境中的伦理推理。

Embedded values-like shape ethical reasoning of large language models on primary care ethical dilemmas.

作者信息

Hadar-Shoval Dorit, Asraf Kfir, Shinan-Altman Shiri, Elyoseph Zohar, Levkovich Inbar

机构信息

The Center for Psychobiological Research, Department of Psychology and Educational Counseling, Max Stern Yezreel Valley College, Israel.

The Louis and Gabi Weisfeld School of Social Work, Bar-Ilan University, Ramat Gan, Israel.

出版信息

Heliyon. 2024 Sep 19;10(18):e38056. doi: 10.1016/j.heliyon.2024.e38056. eCollection 2024 Sep 30.

DOI:10.1016/j.heliyon.2024.e38056
PMID:39381244
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11458949/
Abstract

OBJECTIVE

This article uses the framework of Schwartz's values theory to examine whether the embedded values-like profile within large language models (LLMs) impact ethical decision-making dilemmas faced by primary care. It specifically aims to evaluate whether each LLM exhibits a distinct values-like profile, assess its alignment with general population values, and determine whether latent values influence clinical recommendations.

METHODS

The Portrait Values Questionnaire-Revised (PVQ-RR) was submitted to each LLM (Claude, Bard, GPT-3.5, and GPT-4) 20 times to ensure reliable and valid responses. Their responses were compared to a benchmark derived from a diverse international sample consisting of over 53,000 culturally diverse respondents who completed the PVQ-RR. Four vignettes depicting prototypical professional quandaries involving conflicts between competing values were presented to the LLMs. The option selected by each LLM and the strength of its recommendation were evaluated to determine if underlying values-like impact output.

RESULTS

Each LLM demonstrated a unique values-like profile. Universalism and self-direction were prioritized, while power and tradition were assigned less importance than population benchmarks, suggesting potential Western-centric biases. Four clinical vignettes involving value conflicts were presented to the LLMs. Preliminary indications suggested that embedded values-like influence recommendations. Significant variances in confidence strength regarding chosen recommendations materialized between models, proposing that further vetting is required before the LLMs can be relied on as judgment aids. However, the overall selection of preferences aligned with intrinsic value hierarchies.

CONCLUSION

The distinct intrinsic values-like embedded within LLMs shape ethical decision-making, which carries implications for their integration in primary care settings serving diverse populations. For context-appropriate, equitable delivery of AI-assisted healthcare globally it is essential that LLMs are tailored to align with cultural outlooks.

摘要

目的

本文运用施瓦茨价值观理论框架,探讨大语言模型(LLM)中所蕴含的类似价值观的特征是否会影响初级保健中面临的伦理决策困境。具体目标是评估每个大语言模型是否呈现出独特的类似价值观的特征,评估其与一般人群价值观的一致性,并确定潜在价值观是否会影响临床建议。

方法

将修订后的《肖像价值观问卷》(PVQ-RR)提交给每个大语言模型(Claude、Bard、GPT-3.5和GPT-4)20次,以确保获得可靠且有效的回答。将它们的回答与一个基准进行比较,该基准来自一个由超过53000名具有文化多样性的受访者组成的多样化国际样本,这些受访者完成了PVQ-RR。向大语言模型展示了四个描述涉及相互冲突价值观之间冲突的典型专业困境的 vignette。评估每个大语言模型选择的选项及其推荐的强度,以确定潜在的类似价值观是否会影响输出。

结果

每个大语言模型都展示出独特的类似价值观的特征。普遍主义和自我导向被优先考虑,而权力和传统的重要性低于总体基准,这表明可能存在以西方为中心的偏见。向大语言模型呈现了四个涉及价值观冲突的临床 vignette。初步迹象表明,内在的类似价值观会影响推荐。不同模型在所选推荐的置信强度方面出现了显著差异,这表明在大语言模型可被用作判断辅助工具之前,还需要进一步审查。然而,总体偏好选择与内在价值层次结构一致。

结论

大语言模型中所嵌入的独特内在类似价值观塑造了伦理决策,这对它们在服务于不同人群的初级保健环境中的整合具有影响。为了在全球范围内实现适合具体情境、公平的人工智能辅助医疗服务,大语言模型必须进行调整以符合文化观念。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b671/11458949/d893a3f0c347/gr4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b671/11458949/fc7c7fcd4e97/gr1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b671/11458949/d893a3f0c347/gr4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b671/11458949/fc7c7fcd4e97/gr1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b671/11458949/d893a3f0c347/gr4.jpg

相似文献

1
Embedded values-like shape ethical reasoning of large language models on primary care ethical dilemmas.嵌入价值观塑造大型语言模型在初级保健伦理困境中的伦理推理。
Heliyon. 2024 Sep 19;10(18):e38056. doi: 10.1016/j.heliyon.2024.e38056. eCollection 2024 Sep 30.
2
Assessing the Alignment of Large Language Models With Human Values for Mental Health Integration: Cross-Sectional Study Using Schwartz's Theory of Basic Values.评估大型语言模型与人类心理健康整合价值观的一致性:使用施瓦茨基本价值观理论的横断面研究。
JMIR Ment Health. 2024 Apr 9;11:e55988. doi: 10.2196/55988.
3
The Role of Large Language Models in Transforming Emergency Medicine: Scoping Review.大型语言模型在变革急诊医学中的作用:范围综述
JMIR Med Inform. 2024 May 10;12:e53787. doi: 10.2196/53787.
4
Evaluating the Capabilities of Generative AI Tools in Understanding Medical Papers: Qualitative Study.评估生成式人工智能工具理解医学论文的能力:定性研究
JMIR Med Inform. 2024 Sep 4;12:e59258. doi: 10.2196/59258.
5
Comparing the Perspectives of Generative AI, Mental Health Experts, and the General Public on Schizophrenia Recovery: Case Vignette Study.生成式人工智能、心理健康专家和公众对精神分裂症康复的看法比较:案例情节研究。
JMIR Ment Health. 2024 Mar 18;11:e53043. doi: 10.2196/53043.
6
Leveraging Large Language Models for Precision Monitoring of Chemotherapy-Induced Toxicities: A Pilot Study with Expert Comparisons and Future Directions.利用大语言模型进行化疗诱导毒性的精准监测:一项专家比较及未来方向的试点研究
Cancers (Basel). 2024 Aug 12;16(16):2830. doi: 10.3390/cancers16162830.
7
Large Language Models and User Trust: Consequence of Self-Referential Learning Loop and the Deskilling of Health Care Professionals.大语言模型与用户信任:自我参照学习循环的后果及医疗保健专业人员的技能退化
J Med Internet Res. 2024 Apr 25;26:e56764. doi: 10.2196/56764.
8
Learning to Make Rare and Complex Diagnoses With Generative AI Assistance: Qualitative Study of Popular Large Language Models.利用生成式人工智能辅助学习罕见且复杂的诊断:对流行的大型语言模型的定性研究。
JMIR Med Educ. 2024 Feb 13;10:e51391. doi: 10.2196/51391.
9
Comparing the Performance of Popular Large Language Models on the National Board of Medical Examiners Sample Questions.比较流行的大语言模型在国家医学考试委员会样题上的表现。
Cureus. 2024 Mar 11;16(3):e55991. doi: 10.7759/cureus.55991. eCollection 2024 Mar.
10
Large language models and bariatric surgery patient education: a comparative readability analysis of GPT-3.5, GPT-4, Bard, and online institutional resources.大型语言模型和减重手术患者教育:GPT-3.5、GPT-4、Bard 与在线机构资源的可读性比较分析。
Surg Endosc. 2024 May;38(5):2522-2532. doi: 10.1007/s00464-024-10720-2. Epub 2024 Mar 12.

引用本文的文献

1
A controlled trial examining large Language model conformity in psychiatric assessment using the Asch paradigm.一项使用阿施范式检验大型语言模型在精神科评估中一致性的对照试验。
BMC Psychiatry. 2025 May 12;25(1):478. doi: 10.1186/s12888-025-06912-2.
2
Evaluating Diagnostic Accuracy and Treatment Efficacy in Mental Health: A Comparative Analysis of Large Language Model Tools and Mental Health Professionals.评估心理健康领域的诊断准确性和治疗效果:大语言模型工具与心理健康专业人员的比较分析
Eur J Investig Health Psychol Educ. 2025 Jan 18;15(1):9. doi: 10.3390/ejihpe15010009.
3
Evaluating of BERT-based and Large Language Mod for Suicide Detection, Prevention, and Risk Assessment: A Systematic Review.

本文引用的文献

1
Effects of interacting with a large language model compared with a human coach on the clinical diagnostic process and outcomes among fourth-year medical students: study protocol for a prospective, randomised experiment using patient vignettes.与大语言模型互动和与人类教练互动对四年级医学生临床诊断过程和结果的影响:一项使用病例简述的前瞻性、随机实验的研究方案。
BMJ Open. 2024 Jul 18;14(7):e087469. doi: 10.1136/bmjopen-2024-087469.
2
Clinical and Surgical Applications of Large Language Models: A Systematic Review.大语言模型的临床与外科应用:一项系统综述
J Clin Med. 2024 May 22;13(11):3041. doi: 10.3390/jcm13113041.
3
Patient perspectives on informed consent for medical AI: A web-based experiment.
基于BERT和大语言模型的自杀检测、预防及风险评估研究:一项系统综述
J Med Syst. 2024 Dec 30;48(1):113. doi: 10.1007/s10916-024-02134-3.
患者对医学人工智能知情同意的看法:一项基于网络的实验。
Digit Health. 2024 Apr 30;10:20552076241247938. doi: 10.1177/20552076241247938. eCollection 2024 Jan-Dec.
4
Leveraging Large Language Models for Improved Patient Access and Self-Management: Assessor-Blinded Comparison Between Expert- and AI-Generated Content.利用大语言模型改善患者就医机会和自我管理:专家生成内容与人工智能生成内容的评估者盲法比较
J Med Internet Res. 2024 Apr 25;26:e55847. doi: 10.2196/55847.
5
Assessing the Alignment of Large Language Models With Human Values for Mental Health Integration: Cross-Sectional Study Using Schwartz's Theory of Basic Values.评估大型语言模型与人类心理健康整合价值观的一致性:使用施瓦茨基本价值观理论的横断面研究。
JMIR Ment Health. 2024 Apr 9;11:e55988. doi: 10.2196/55988.
6
Capacity of Generative AI to Interpret Human Emotions From Visual and Textual Data: Pilot Evaluation Study.生成式人工智能从视觉和文本数据中解读人类情感的能力:初步评估研究。
JMIR Ment Health. 2024 Feb 6;11:e54369. doi: 10.2196/54369.
7
Assessing prognosis in depression: comparing perspectives of AI models, mental health professionals and the general public.评估抑郁症预后:比较人工智能模型、心理健康专业人员和公众的观点。
Fam Med Community Health. 2024 Jan 9;12(Suppl 1):e002583. doi: 10.1136/fmch-2023-002583.
8
Identifying depression and its determinants upon initiating treatment: ChatGPT versus primary care physicians.在开始治疗时识别抑郁及其决定因素:ChatGPT 与初级保健医生。
Fam Med Community Health. 2023 Sep;11(4). doi: 10.1136/fmch-2023-002391.
9
Exploring the role of ChatGPT in patient care (diagnosis and treatment) and medical research: A systematic review.探索ChatGPT在患者护理(诊断与治疗)及医学研究中的作用:一项系统综述。
Health Promot Perspect. 2023 Sep 11;13(3):183-191. doi: 10.34172/hpp.2023.22. eCollection 2023.
10
Suicide Risk Assessments Through the Eyes of ChatGPT-3.5 Versus ChatGPT-4: Vignette Study.通过ChatGPT-3.5与ChatGPT-4视角进行的自杀风险评估:案例研究
JMIR Ment Health. 2023 Sep 20;10:e51232. doi: 10.2196/51232.