Maltezou-Papastylianou Constantina, Scherer Reinhold, Paulmann Silke
Department of Psychology and Centre for Brain Science, University of Essex, Colchester, CO4 3SQ, UK.
Brain-Computer Interfaces and Neural Engineering Laboratory, School of Computer Science and Electronic Engineering, University of Essex, Colchester, CO4 3SQ, UK.
Sci Data. 2025 May 31;12(1):921. doi: 10.1038/s41597-025-05267-3.
The multi-disciplinary field of voice perception and trustworthiness lacks accessible and diverse speech audio datasets representing diverse speaker demographics, including age, ethnicity, and sex. Existing datasets primarily feature white, younger adult speakers, limiting generalisability. This paper introduces a novel open-access speech audio dataset with 1,152 utterances from 96 untrained speakers, across white, black and south Asian backgrounds, divided into younger (N = 60, ages 18-45) and older (N = 36, ages 60+) adults. Each speaker recorded both, their natural speech patterns (i.e. "neutral" or no intent), and their attempt to convey their trustworthy intent as they perceive it during speech production. Our dataset is described and evaluated through classification methods between neutral and trustworthy speech. Specifically, extracted acoustic and voice quality features were analysed using linear and non-linear classification models, achieving accuracies of around 70%. This dataset aims to close a crucial gap in the existing literature and provide additional research opportunities that can contribute to the generalisability and applicability of future research results in this field.
语音感知与可信度这一跨学科领域缺乏能代表不同说话者人口统计学特征(包括年龄、种族和性别)的可获取且多样的语音音频数据集。现有数据集主要以白人、年轻成年说话者为特征,限制了其通用性。本文介绍了一个新颖的开放获取语音音频数据集,该数据集包含来自96名未经训练的说话者的1152条话语,这些说话者具有白人、黑人及南亚背景,分为年轻组(N = 60,年龄18 - 45岁)和年长组(N = 36,年龄60岁以上)成年人。每位说话者都录制了他们的自然语音模式(即“中性”或无特定意图),以及他们在语音生成过程中尝试传达其认为的可信意图。我们通过中性语音和可信语音之间的分类方法对数据集进行了描述和评估。具体而言,使用线性和非线性分类模型对提取的声学和语音质量特征进行了分析,准确率达到了约70%。该数据集旨在填补现有文献中的关键空白,并提供额外的研究机会,有助于提高该领域未来研究结果的通用性和适用性。