Suppr超能文献

大语言模型能模拟人类对话吗?

Can Large Language Models Simulate Spoken Human Conversations?

作者信息

Mayor Eric, Bietti Lucas M, Bangerter Adrian

机构信息

Department of Psychology, University of Basel.

Department of Psychology, Norwegian University of Science and Technology.

出版信息

Cogn Sci. 2025 Sep;49(9):e70106. doi: 10.1111/cogs.70106.

Abstract

Large language models (LLMs) can emulate many aspects of human cognition and have been heralded as a potential paradigm shift. They are proficient in chat-based conversation, but little is known about their ability to simulate spoken conversation. We investigated whether LLMs can simulate spoken human conversation. In Study 1, we compared transcripts of human telephone conversations from the Switchboard (SB) corpus to six corpora of transcripts generated by two powerful LLMs, GPT-4 and Claude Sonnet 3.5, and two open-source LLMs, Vicuna and Wayfarer, using different prompts designed to mimic SB participants' instructions. We compared LLM and SB conversations in terms of alignment (conceptual, syntactic, and lexical), coordination markers, and coordination of openings and closings. We also documented qualitative features by which LLM conversations differ from SB conversations. In Study 2, we assessed whether humans can distinguish transcripts produced by LLMs from those of SB conversations. LLM conversations exhibited exaggerated alignment (and an increase in alignment as conversation unfolded) relative to human conversations, different and often inappropriate use of coordination markers, and were dissimilar to human conversations in openings and closings. LLM conversations did not consistently pass for SB conversations. Spoken conversations generated by LLMs are both qualitatively and quantitatively different from those of humans. This issue may evolve with better LLMs and more training on spoken conversation, but may also result from key differences between spoken conversation and chat.

摘要

大语言模型(LLMs)可以模拟人类认知的许多方面,并被誉为一种潜在的范式转变。它们擅长基于聊天的对话,但对于它们模拟口语对话的能力却知之甚少。我们研究了大语言模型是否能够模拟人类口语对话。在研究1中,我们将来自交换机(SB)语料库的人类电话对话转录本与由两个强大的大语言模型GPT-4和Claude Sonnet 3.5以及两个开源大语言模型Vicuna和Wayfarer生成的六个转录本语料库进行了比较,使用了旨在模仿SB参与者指令的不同提示。我们从对齐(概念、句法和词汇)、协调标记以及开头和结尾的协调方面比较了大语言模型和SB的对话。我们还记录了大语言模型对话与SB对话不同的定性特征。在研究2中,我们评估了人类是否能够区分大语言模型生成的转录本和SB对话的转录本。相对于人类对话,大语言模型对话表现出过度的对齐(并且随着对话的展开对齐程度增加)、对协调标记的不同且往往不适当的使用,并且在开头和结尾与人类对话不同。大语言模型对话并不能始终被当作SB对话。大语言模型生成的口语对话在质量和数量上都与人类的不同。随着更好的大语言模型和更多关于口语对话的训练,这个问题可能会演变,但也可能是由于口语对话和聊天之间的关键差异导致的。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验