• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

关于大语言模型的大规模道德机器实验。

Large-scale moral machine experiment on large language models.

作者信息

Zaim Bin Ahmad Muhammad Shahrul, Takemoto Kazuhiro

机构信息

Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, Iizuka, Japan.

Faculty of Engineering and Technology, Multimedia University, Melaka, Malaysia.

出版信息

PLoS One. 2025 May 21;20(5):e0322776. doi: 10.1371/journal.pone.0322776. eCollection 2025.

DOI:10.1371/journal.pone.0322776
PMID:40397922
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12094719/
Abstract

The rapid advancement of Large Language Models (LLMs) and their potential integration into autonomous driving systems necessitates understanding their moral decision-making capabilities. While our previous study examined four prominent LLMs using the Moral Machine experimental framework, the dynamic landscape of LLM development demands a more comprehensive analysis. Here, we evaluate moral judgments across 52 different LLMs, including multiple versions of proprietary models (GPT, Claude, Gemini) and open-source alternatives (Llama, Gemma), to assess their alignment with human moral preferences in autonomous driving scenarios. Using a conjoint analysis framework, we evaluated how closely LLM responses aligned with human preferences in ethical dilemmas and examined the effects of model size, updates, and architecture. Results showed that proprietary models and open-source models exceeding 10 billion parameters demonstrated relatively close alignment with human judgments, with a significant negative correlation between model size and distance from human judgments in open-source models. However, model updates did not consistently improve alignment with human preferences, and many LLMs showed excessive emphasis on specific ethical principles. These findings suggest that while increasing model size may naturally lead to more human-like moral judgments, practical implementation in autonomous driving systems requires careful consideration of the trade-off between judgment quality and computational efficiency. Our comprehensive analysis provides crucial insights for the ethical design of autonomous systems and highlights the importance of considering cultural contexts in AI moral decision-making.

摘要

大语言模型(LLMs)的快速发展及其与自动驾驶系统的潜在整合,使得理解它们的道德决策能力变得十分必要。虽然我们之前的研究使用道德机器实验框架对四个著名的大语言模型进行了考察,但大语言模型发展的动态态势需要更全面的分析。在此,我们评估了52个不同的大语言模型的道德判断,包括多个版本的专有模型(GPT、Claude、Gemini)和开源替代模型(Llama、Gemma),以评估它们在自动驾驶场景中与人类道德偏好的契合程度。我们使用联合分析框架,评估了大语言模型的回答在伦理困境中与人类偏好的契合程度,并考察了模型规模、更新和架构的影响。结果表明,参数超过100亿的专有模型和开源模型与人类判断表现出相对紧密的契合度,在开源模型中,模型规模与偏离人类判断的程度之间存在显著的负相关。然而,模型更新并没有持续提高与人类偏好的契合度,许多大语言模型表现出对特定伦理原则的过度强调。这些发现表明,虽然增加模型规模可能自然地导致更类似人类的道德判断,但在自动驾驶系统中的实际应用需要仔细考虑判断质量和计算效率之间的权衡。我们的全面分析为自主系统的伦理设计提供了关键见解,并凸显了在人工智能道德决策中考虑文化背景的重要性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3b39/12094719/4eff2b977eaa/pone.0322776.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3b39/12094719/bfa5c4053e99/pone.0322776.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3b39/12094719/8f92f0a08d5c/pone.0322776.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3b39/12094719/09551364dfc2/pone.0322776.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3b39/12094719/9223b6cde322/pone.0322776.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3b39/12094719/8e6935311ad5/pone.0322776.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3b39/12094719/4eff2b977eaa/pone.0322776.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3b39/12094719/bfa5c4053e99/pone.0322776.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3b39/12094719/8f92f0a08d5c/pone.0322776.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3b39/12094719/09551364dfc2/pone.0322776.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3b39/12094719/9223b6cde322/pone.0322776.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3b39/12094719/8e6935311ad5/pone.0322776.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3b39/12094719/4eff2b977eaa/pone.0322776.g006.jpg

相似文献

1
Large-scale moral machine experiment on large language models.关于大语言模型的大规模道德机器实验。
PLoS One. 2025 May 21;20(5):e0322776. doi: 10.1371/journal.pone.0322776. eCollection 2025.
2
The moral machine experiment on large language models.关于大语言模型的道德机器实验。
R Soc Open Sci. 2024 Feb 7;11(2):231393. doi: 10.1098/rsos.231393. eCollection 2024 Feb.
3
AI language model rivals expert ethicist in perceived moral expertise.在被感知的道德专业知识方面,人工智能语言模型可与专家伦理学家相媲美。
Sci Rep. 2025 Feb 3;15(1):4084. doi: 10.1038/s41598-025-86510-0.
4
Assessing the Alignment of Large Language Models With Human Values for Mental Health Integration: Cross-Sectional Study Using Schwartz's Theory of Basic Values.评估大型语言模型与人类心理健康整合价值观的一致性:使用施瓦茨基本价值观理论的横断面研究。
JMIR Ment Health. 2024 Apr 9;11:e55988. doi: 10.2196/55988.
5
Moral Complexity in Traffic: Advancing the ADC Model for Automated Driving Systems.交通中的道德复杂性:推进自动驾驶系统的ADC模型
Sci Eng Ethics. 2025 Jan 24;31(1):5. doi: 10.1007/s11948-025-00528-1.
6
Embedded values-like shape ethical reasoning of large language models on primary care ethical dilemmas.嵌入价值观塑造大型语言模型在初级保健伦理困境中的伦理推理。
Heliyon. 2024 Sep 19;10(18):e38056. doi: 10.1016/j.heliyon.2024.e38056. eCollection 2024 Sep 30.
7
Modeling Morality in 3-D: Decision-Making, Judgment, and Inference.三维中的道德建模:决策、判断和推理。
Top Cogn Sci. 2019 Apr;11(2):409-432. doi: 10.1111/tops.12382. Epub 2018 Sep 14.
8
Evaluating the Capabilities of Generative AI Tools in Understanding Medical Papers: Qualitative Study.评估生成式人工智能工具理解医学论文的能力:定性研究
JMIR Med Inform. 2024 Sep 4;12:e59258. doi: 10.2196/59258.
9
Open-Source Large Language Models in Radiology: A Review and Tutorial for Practical Research and Clinical Deployment.放射学中的开源大语言模型:实践研究与临床应用综述及教程
Radiology. 2025 Jan;314(1):e241073. doi: 10.1148/radiol.241073.
10
A dataset and benchmark for hospital course summarization with adapted large language models.一个用于医院病程总结的数据集和基准测试,采用了适配的大语言模型。
J Am Med Inform Assoc. 2025 Mar 1;32(3):470-479. doi: 10.1093/jamia/ocae312.

本文引用的文献

1
Closing the gap between open source and commercial large language models for medical evidence summarization.弥合用于医学证据总结的开源大型语言模型与商业大型语言模型之间的差距。
NPJ Digit Med. 2024 Sep 9;7(1):239. doi: 10.1038/s41746-024-01239-w.
2
The moral machine experiment on large language models.关于大语言模型的道德机器实验。
R Soc Open Sci. 2024 Feb 7;11(2):231393. doi: 10.1098/rsos.231393. eCollection 2024 Feb.
3
ChatGPT's inconsistent moral advice influences users' judgment.ChatGPT 给出的前后不一致的道德建议会影响用户的判断。
Sci Rep. 2023 Apr 6;13(1):4569. doi: 10.1038/s41598-023-31341-0.
4
ChatGPT Utility in Healthcare Education, Research, and Practice: Systematic Review on the Promising Perspectives and Valid Concerns.ChatGPT在医学教育、研究与实践中的应用:对其前景与合理担忧的系统评价
Healthcare (Basel). 2023 Mar 19;11(6):887. doi: 10.3390/healthcare11060887.
5
Ethical dilemmas are really important to potential adopters of autonomous vehicles.道德困境对于自动驾驶汽车的潜在使用者来说非常重要。
Ethics Inf Technol. 2021;23(4):657-673. doi: 10.1007/s10676-021-09605-y. Epub 2021 Jul 2.
6
Life and death decisions of autonomous vehicles.自动驾驶汽车的生死抉择。
Nature. 2020 Mar;579(7797):E1-E2. doi: 10.1038/s41586-020-1987-4. Epub 2020 Mar 4.
7
'Moral machine' experiment is no basis for policymaking.“道德机器”实验并非政策制定的依据。
Nature. 2019 Mar;567(7746):31. doi: 10.1038/d41586-019-00766-x.
8
The Moral Machine experiment.道德机器实验。
Nature. 2018 Nov;563(7729):59-64. doi: 10.1038/s41586-018-0637-6. Epub 2018 Oct 24.
9
Human Decisions in Moral Dilemmas are Largely Described by Utilitarianism: Virtual Car Driving Study Provides Guidelines for Autonomous Driving Vehicles.在道德困境中,人类的决策在很大程度上由功利主义描述:虚拟汽车驾驶研究为自动驾驶车辆提供了指导方针。
Sci Eng Ethics. 2019 Apr;25(2):399-418. doi: 10.1007/s11948-018-0020-x. Epub 2018 Jan 22.
10
The social dilemma of autonomous vehicles.自动驾驶汽车的社会困境。
Science. 2016 Jun 24;352(6293):1573-6. doi: 10.1126/science.aaf2654.