Suppr超能文献

传达可信意图的人类声音:一个人口统计学上多样化的语音音频数据集。

Human voices communicating trustworthy intent: A demographically diverse speech audio dataset.

作者信息

Maltezou-Papastylianou Constantina, Scherer Reinhold, Paulmann Silke

机构信息

Department of Psychology and Centre for Brain Science, University of Essex, Colchester, CO4 3SQ, UK.

Brain-Computer Interfaces and Neural Engineering Laboratory, School of Computer Science and Electronic Engineering, University of Essex, Colchester, CO4 3SQ, UK.

出版信息

Sci Data. 2025 May 31;12(1):921. doi: 10.1038/s41597-025-05267-3.

Abstract

The multi-disciplinary field of voice perception and trustworthiness lacks accessible and diverse speech audio datasets representing diverse speaker demographics, including age, ethnicity, and sex. Existing datasets primarily feature white, younger adult speakers, limiting generalisability. This paper introduces a novel open-access speech audio dataset with 1,152 utterances from 96 untrained speakers, across white, black and south Asian backgrounds, divided into younger (N = 60, ages 18-45) and older (N = 36, ages 60+) adults. Each speaker recorded both, their natural speech patterns (i.e. "neutral" or no intent), and their attempt to convey their trustworthy intent as they perceive it during speech production. Our dataset is described and evaluated through classification methods between neutral and trustworthy speech. Specifically, extracted acoustic and voice quality features were analysed using linear and non-linear classification models, achieving accuracies of around 70%. This dataset aims to close a crucial gap in the existing literature and provide additional research opportunities that can contribute to the generalisability and applicability of future research results in this field.

摘要

语音感知与可信度这一跨学科领域缺乏能代表不同说话者人口统计学特征(包括年龄、种族和性别)的可获取且多样的语音音频数据集。现有数据集主要以白人、年轻成年说话者为特征,限制了其通用性。本文介绍了一个新颖的开放获取语音音频数据集,该数据集包含来自96名未经训练的说话者的1152条话语,这些说话者具有白人、黑人及南亚背景,分为年轻组(N = 60,年龄18 - 45岁)和年长组(N = 36,年龄60岁以上)成年人。每位说话者都录制了他们的自然语音模式(即“中性”或无特定意图),以及他们在语音生成过程中尝试传达其认为的可信意图。我们通过中性语音和可信语音之间的分类方法对数据集进行了描述和评估。具体而言,使用线性和非线性分类模型对提取的声学和语音质量特征进行了分析,准确率达到了约70%。该数据集旨在填补现有文献中的关键空白,并提供额外的研究机会,有助于提高该领域未来研究结果的通用性和适用性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b959/12126476/302d780c0eaf/41597_2025_5267_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验