文献检索文档翻译深度研究
Suppr Zotero 插件Zotero 插件
邀请有礼套餐&价格历史记录

新学期,新优惠

限时优惠:9月1日-9月22日

30天高级会员仅需29元

1天体验卡首发特惠仅需5.99元

了解详情
不再提醒
插件&应用
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
高级版
套餐订阅购买积分包
AI 工具
文献检索文档翻译深度研究
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2025

关于大语言模型的大规模道德机器实验。

Large-scale moral machine experiment on large language models.

作者信息

Zaim Bin Ahmad Muhammad Shahrul, Takemoto Kazuhiro

机构信息

Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, Iizuka, Japan.

Faculty of Engineering and Technology, Multimedia University, Melaka, Malaysia.

出版信息

PLoS One. 2025 May 21;20(5):e0322776. doi: 10.1371/journal.pone.0322776. eCollection 2025.


DOI:10.1371/journal.pone.0322776
PMID:40397922
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12094719/
Abstract

The rapid advancement of Large Language Models (LLMs) and their potential integration into autonomous driving systems necessitates understanding their moral decision-making capabilities. While our previous study examined four prominent LLMs using the Moral Machine experimental framework, the dynamic landscape of LLM development demands a more comprehensive analysis. Here, we evaluate moral judgments across 52 different LLMs, including multiple versions of proprietary models (GPT, Claude, Gemini) and open-source alternatives (Llama, Gemma), to assess their alignment with human moral preferences in autonomous driving scenarios. Using a conjoint analysis framework, we evaluated how closely LLM responses aligned with human preferences in ethical dilemmas and examined the effects of model size, updates, and architecture. Results showed that proprietary models and open-source models exceeding 10 billion parameters demonstrated relatively close alignment with human judgments, with a significant negative correlation between model size and distance from human judgments in open-source models. However, model updates did not consistently improve alignment with human preferences, and many LLMs showed excessive emphasis on specific ethical principles. These findings suggest that while increasing model size may naturally lead to more human-like moral judgments, practical implementation in autonomous driving systems requires careful consideration of the trade-off between judgment quality and computational efficiency. Our comprehensive analysis provides crucial insights for the ethical design of autonomous systems and highlights the importance of considering cultural contexts in AI moral decision-making.

摘要

大语言模型(LLMs)的快速发展及其与自动驾驶系统的潜在整合,使得理解它们的道德决策能力变得十分必要。虽然我们之前的研究使用道德机器实验框架对四个著名的大语言模型进行了考察,但大语言模型发展的动态态势需要更全面的分析。在此,我们评估了52个不同的大语言模型的道德判断,包括多个版本的专有模型(GPT、Claude、Gemini)和开源替代模型(Llama、Gemma),以评估它们在自动驾驶场景中与人类道德偏好的契合程度。我们使用联合分析框架,评估了大语言模型的回答在伦理困境中与人类偏好的契合程度,并考察了模型规模、更新和架构的影响。结果表明,参数超过100亿的专有模型和开源模型与人类判断表现出相对紧密的契合度,在开源模型中,模型规模与偏离人类判断的程度之间存在显著的负相关。然而,模型更新并没有持续提高与人类偏好的契合度,许多大语言模型表现出对特定伦理原则的过度强调。这些发现表明,虽然增加模型规模可能自然地导致更类似人类的道德判断,但在自动驾驶系统中的实际应用需要仔细考虑判断质量和计算效率之间的权衡。我们的全面分析为自主系统的伦理设计提供了关键见解,并凸显了在人工智能道德决策中考虑文化背景的重要性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3b39/12094719/4eff2b977eaa/pone.0322776.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3b39/12094719/bfa5c4053e99/pone.0322776.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3b39/12094719/8f92f0a08d5c/pone.0322776.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3b39/12094719/09551364dfc2/pone.0322776.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3b39/12094719/9223b6cde322/pone.0322776.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3b39/12094719/8e6935311ad5/pone.0322776.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3b39/12094719/4eff2b977eaa/pone.0322776.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3b39/12094719/bfa5c4053e99/pone.0322776.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3b39/12094719/8f92f0a08d5c/pone.0322776.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3b39/12094719/09551364dfc2/pone.0322776.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3b39/12094719/9223b6cde322/pone.0322776.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3b39/12094719/8e6935311ad5/pone.0322776.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3b39/12094719/4eff2b977eaa/pone.0322776.g006.jpg

相似文献

[1]
Large-scale moral machine experiment on large language models.

PLoS One. 2025-5-21

[2]
The moral machine experiment on large language models.

R Soc Open Sci. 2024-2-7

[3]
AI language model rivals expert ethicist in perceived moral expertise.

Sci Rep. 2025-2-3

[4]
Assessing the Alignment of Large Language Models With Human Values for Mental Health Integration: Cross-Sectional Study Using Schwartz's Theory of Basic Values.

JMIR Ment Health. 2024-4-9

[5]
Moral Complexity in Traffic: Advancing the ADC Model for Automated Driving Systems.

Sci Eng Ethics. 2025-1-24

[6]
Embedded values-like shape ethical reasoning of large language models on primary care ethical dilemmas.

Heliyon. 2024-9-19

[7]
Modeling Morality in 3-D: Decision-Making, Judgment, and Inference.

Top Cogn Sci. 2018-9-14

[8]
Evaluating the Capabilities of Generative AI Tools in Understanding Medical Papers: Qualitative Study.

JMIR Med Inform. 2024-9-4

[9]
Open-Source Large Language Models in Radiology: A Review and Tutorial for Practical Research and Clinical Deployment.

Radiology. 2025-1

[10]
A dataset and benchmark for hospital course summarization with adapted large language models.

J Am Med Inform Assoc. 2025-3-1

本文引用的文献

[1]
Closing the gap between open source and commercial large language models for medical evidence summarization.

NPJ Digit Med. 2024-9-9

[2]
The moral machine experiment on large language models.

R Soc Open Sci. 2024-2-7

[3]
ChatGPT's inconsistent moral advice influences users' judgment.

Sci Rep. 2023-4-6

[4]
ChatGPT Utility in Healthcare Education, Research, and Practice: Systematic Review on the Promising Perspectives and Valid Concerns.

Healthcare (Basel). 2023-3-19

[5]
Ethical dilemmas are really important to potential adopters of autonomous vehicles.

Ethics Inf Technol. 2021

[6]
Life and death decisions of autonomous vehicles.

Nature. 2020-3

[7]
'Moral machine' experiment is no basis for policymaking.

Nature. 2019-3

[8]
The Moral Machine experiment.

Nature. 2018-10-24

[9]
Human Decisions in Moral Dilemmas are Largely Described by Utilitarianism: Virtual Car Driving Study Provides Guidelines for Autonomous Driving Vehicles.

Sci Eng Ethics. 2018-1-22

[10]
The social dilemma of autonomous vehicles.

Science. 2016-6-24

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

推荐工具

医学文档翻译智能文献检索