Suppr超能文献

KBES:一个用于具有强度水平的现实孟加拉语语音情感识别的数据集。

KBES: A dataset for realistic Bangla speech emotion recognition with intensity level.

作者信息

Billah Md Masum, Sarker Md Likhon, Akhand M A H

机构信息

Department of Computer Science and Engineering, Khulna University of Engineering & Technology (KUET), Bangladesh.

出版信息

Data Brief. 2023 Oct 31;51:109741. doi: 10.1016/j.dib.2023.109741. eCollection 2023 Dec.

Abstract

Speech Emotion Recognition (SER) identifies and categorizes emotional states by analyzing speech signals. SER is an emerging research area using machine learning and deep learning techniques due to its socio-cultural and business importance. An appropriate dataset is an important resource for SER related studies in a particular language. There is an apparent lack of SER datasets in Bangla language although it is one of the most spoken languages in the world. There are a few Bangla SER datasets but those consist of only a few dialogs with a minimal number of actors making them unsuitable for real-world applications. Moreover, the existing datasets do not consider the intensity level of emotions. The intensity of a specific emotional expression, such as anger or sadness, plays a crucial role in social behavior. Therefore, a realistic Bangla speech dataset is developed in this study which is called KUET Bangla Emotional Speech (KBES) dataset. The dataset consists of 900 audio signals (i.e., speech dialogs) from 35 actors (20 females and 15 males) with diverse age ranges. Source of the speech dialogs are Bangla Telefilm, Drama, TV Series, Web Series. There are five emotional categories: Neutral, Happy, Sad, Angry, and Disgust. Except Neutral, samples of a particular emotion are divided into two intensity levels: Low and High. The significant issue of the dataset is that the speech dialogs are almost unique with relatively large number of actors; whereas, existing datasets (such as SUBESCO and BanglaSER) contain samples with repeatedly spoken of a few pre-defined dialogs by a few actors/research volunteers in the laboratory environment. Finally, the KBES dataset is exposed as a nine-class problem to classify emotions into nine categories: Neutral, Happy (Low), Happy (High), Sad (Low), Sad (High), Angry (Low), Angry (High), Disgust (Low) and Disgust (High). However, the dataset is kept symmetrical containing 100 samples for each of the nine classes; 100 samples are also gender balanced with 50 samples for male/female actors. The developed dataset seems a realistic dataset while compared with the existing SER datasets.

摘要

语音情感识别(SER)通过分析语音信号来识别和分类情感状态。由于其在社会文化和商业方面的重要性,SER是一个使用机器学习和深度学习技术的新兴研究领域。合适的数据集是特定语言中与SER相关研究的重要资源。尽管孟加拉语是世界上使用最广泛的语言之一,但明显缺乏孟加拉语的SER数据集。有一些孟加拉语SER数据集,但它们仅由少数演员参与的少量对话组成,这使得它们不适用于实际应用。此外,现有数据集没有考虑情感的强度级别。特定情感表达的强度,如愤怒或悲伤,在社会行为中起着至关重要的作用。因此,本研究开发了一个现实的孟加拉语语音数据集,称为KUET孟加拉语情感语音(KBES)数据集。该数据集由来自35名演员(20名女性和15名男性)的900个音频信号(即语音对话)组成,演员年龄范围各异。语音对话的来源是孟加拉语电视电影、戏剧、电视剧、网络剧。有五个情感类别:中性、快乐、悲伤、愤怒和厌恶。除中性外,特定情感的样本分为两个强度级别:低和高。该数据集的一个重要问题是,语音对话几乎是唯一的,且演员数量相对较多;而现有数据集(如SUBESCO和BanglaSER)包含在实验室环境中由少数演员/研究志愿者反复说出的一些预定义对话的样本。最后,KBES数据集被作为一个九类问题呈现,将情感分为九类:中性、快乐(低)、快乐(高)、悲伤(低)、悲伤(高)、愤怒(低)、愤怒(高)、厌恶(低)和厌恶(高)。然而,该数据集保持对称,九个类别中的每一个都包含100个样本;100个样本在性别上也保持平衡,男性/女性演员各有50个样本。与现有的SER数据集相比,开发的数据集似乎是一个现实的数据集。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/16ae/10641593/3faa46e98151/gr1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验