关于大语言模型的大规模道德机器实验。

Large-scale moral machine experiment on large language models.

作者信息

Zaim Bin Ahmad Muhammad Shahrul, Takemoto Kazuhiro

机构信息

Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, Iizuka, Japan.

Faculty of Engineering and Technology, Multimedia University, Melaka, Malaysia.

出版信息

PLoS One. 2025 May 21;20(5):e0322776. doi: 10.1371/journal.pone.0322776. eCollection 2025.

DOI:10.1371/journal.pone.0322776

PMID:40397922

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12094719/

Abstract

The rapid advancement of Large Language Models (LLMs) and their potential integration into autonomous driving systems necessitates understanding their moral decision-making capabilities. While our previous study examined four prominent LLMs using the Moral Machine experimental framework, the dynamic landscape of LLM development demands a more comprehensive analysis. Here, we evaluate moral judgments across 52 different LLMs, including multiple versions of proprietary models (GPT, Claude, Gemini) and open-source alternatives (Llama, Gemma), to assess their alignment with human moral preferences in autonomous driving scenarios. Using a conjoint analysis framework, we evaluated how closely LLM responses aligned with human preferences in ethical dilemmas and examined the effects of model size, updates, and architecture. Results showed that proprietary models and open-source models exceeding 10 billion parameters demonstrated relatively close alignment with human judgments, with a significant negative correlation between model size and distance from human judgments in open-source models. However, model updates did not consistently improve alignment with human preferences, and many LLMs showed excessive emphasis on specific ethical principles. These findings suggest that while increasing model size may naturally lead to more human-like moral judgments, practical implementation in autonomous driving systems requires careful consideration of the trade-off between judgment quality and computational efficiency. Our comprehensive analysis provides crucial insights for the ethical design of autonomous systems and highlights the importance of considering cultural contexts in AI moral decision-making.

摘要

大语言模型（LLMs）的快速发展及其与自动驾驶系统的潜在整合，使得理解它们的道德决策能力变得十分必要。虽然我们之前的研究使用道德机器实验框架对四个著名的大语言模型进行了考察，但大语言模型发展的动态态势需要更全面的分析。在此，我们评估了52个不同的大语言模型的道德判断，包括多个版本的专有模型（GPT、Claude、Gemini）和开源替代模型（Llama、Gemma），以评估它们在自动驾驶场景中与人类道德偏好的契合程度。我们使用联合分析框架，评估了大语言模型的回答在伦理困境中与人类偏好的契合程度，并考察了模型规模、更新和架构的影响。结果表明，参数超过100亿的专有模型和开源模型与人类判断表现出相对紧密的契合度，在开源模型中，模型规模与偏离人类判断的程度之间存在显著的负相关。然而，模型更新并没有持续提高与人类偏好的契合度，许多大语言模型表现出对特定伦理原则的过度强调。这些发现表明，虽然增加模型规模可能自然地导致更类似人类的道德判断，但在自动驾驶系统中的实际应用需要仔细考虑判断质量和计算效率之间的权衡。我们的全面分析为自主系统的伦理设计提供了关键见解，并凸显了在人工智能道德决策中考虑文化背景的重要性。