Barrington Sarah, Cooper Emily A, Farid Hany
School of Information, University of California, Berkeley, CA, 94720, USA.
Herbert Wertheim School of Optometry, University of California, Berkeley, CA, 94720, USA.
Sci Rep. 2025 Mar 31;15(1):11004. doi: 10.1038/s41598-025-94170-3.
As generative artificial intelligence (AI) continues its ballistic trajectory, everything from text to audio, image, and video generation continues to improve at mimicking human-generated content. Through a series of perceptual studies, we report on the realism of AI-generated voices in terms of identity matching and naturalness. We find human participants cannot consistently identify recordings of AI-generated voices. Specifically, participants perceived the identity of an AI-generated voice to be the same as its real counterpart approximately [Formula: see text] of the time, and correctly identified a voice as AI generated only about [Formula: see text] of the time.
随着生成式人工智能(AI)继续其迅猛发展的轨迹,从文本到音频、图像和视频生成的各个方面在模仿人类生成的内容方面都在不断改进。通过一系列感知研究,我们报告了人工智能生成语音在身份匹配和自然度方面的逼真程度。我们发现人类参与者无法始终如一地识别人工智能生成语音的录音。具体而言,参与者大约有[公式:见原文]的时间认为人工智能生成语音的身份与其真实对应语音相同,而正确识别出某语音是由人工智能生成的概率仅约为[公式:见原文]。