• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于粒子群优化的草纤维根优化算法优化的堆叠自动编码器的语音情感识别

Speech emotion recognition based on a stacked autoencoders optimized by PSO based grass fibrous root optimization.

作者信息

Zeng Chi, Li Jialing, Habibi Abbas

机构信息

Xinyang Vocational and Technical College, Xinyang, 464000, Henan, China.

School of Artificial Intellegence, Chongqing Youth Vocational & Technical College, Chongqing, 401320, China.

出版信息

Sci Rep. 2025 Jul 18;15(1):26158. doi: 10.1038/s41598-025-08703-x.

DOI:10.1038/s41598-025-08703-x
PMID:40681606
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12274284/
Abstract

Effective speech emotion recognition (SER) poses a significant challenge due to the intricate and subjective nature of human emotions. Recognizing emotional states accurately from speech signals has a broad spectrum of practical applications, such as healthcare, human-computer interaction, and social robotics. This study introduces an innovative approach that merges deep learning with metaheuristic algorithms to boost the efficiency of SER systems. Specifically, a stacked autoencoder (SAE) serves as the primary model, and its performance is fine-tuned using a nature-inspired hybrid algorithm that combines particle swarm optimization (PSO) with Grass Fibrous Root Optimization (GFRO). The proposed model adeptly extracts spectral and pitch features from speech signals, encompassing spectral crest, spectral entropy, spectral flux, and harmonic ratio, to capture emotional cues effectively. The model's performance is evaluated on a standard emotion recognition dataset, comparing with some state-of-the-art models, including Convolutional Neural Network (CNN), Support Vector Machine (SVM), Deep Learning (DL), CNN and Iterative Neighborhood Component Analysis (CNN/INCA), VGG-16 achieving high accuracy in identifying various emotional states.

摘要

由于人类情感的复杂性和主观性,有效的语音情感识别(SER)面临着重大挑战。从语音信号中准确识别情绪状态具有广泛的实际应用,如医疗保健、人机交互和社会机器人技术。本研究引入了一种创新方法,将深度学习与元启发式算法相结合,以提高SER系统的效率。具体而言,堆叠自动编码器(SAE)作为主要模型,其性能通过一种受自然启发的混合算法进行微调,该算法将粒子群优化(PSO)与草纤维根优化(GFRO)相结合。所提出的模型能够有效地从语音信号中提取频谱和音高特征,包括频谱峰值、频谱熵、频谱通量和谐波比,以有效捕捉情感线索。该模型的性能在一个标准的情感识别数据集上进行评估,并与一些先进模型进行比较,包括卷积神经网络(CNN)、支持向量机(SVM)、深度学习(DL)、CNN和迭代邻域成分分析(CNN/INCA)、VGG-16,在识别各种情绪状态方面取得了高精度。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bdc0/12274284/dc0266f7f966/41598_2025_8703_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bdc0/12274284/d1acaeb94d65/41598_2025_8703_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bdc0/12274284/e138a9d54817/41598_2025_8703_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bdc0/12274284/228538957464/41598_2025_8703_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bdc0/12274284/60e232d71248/41598_2025_8703_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bdc0/12274284/757dce4bb2ca/41598_2025_8703_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bdc0/12274284/9d575f99831b/41598_2025_8703_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bdc0/12274284/3b2dba6df3a3/41598_2025_8703_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bdc0/12274284/dc0266f7f966/41598_2025_8703_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bdc0/12274284/d1acaeb94d65/41598_2025_8703_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bdc0/12274284/e138a9d54817/41598_2025_8703_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bdc0/12274284/228538957464/41598_2025_8703_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bdc0/12274284/60e232d71248/41598_2025_8703_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bdc0/12274284/757dce4bb2ca/41598_2025_8703_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bdc0/12274284/9d575f99831b/41598_2025_8703_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bdc0/12274284/3b2dba6df3a3/41598_2025_8703_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bdc0/12274284/dc0266f7f966/41598_2025_8703_Fig8_HTML.jpg

相似文献

1
Speech emotion recognition based on a stacked autoencoders optimized by PSO based grass fibrous root optimization.基于粒子群优化的草纤维根优化算法优化的堆叠自动编码器的语音情感识别
Sci Rep. 2025 Jul 18;15(1):26158. doi: 10.1038/s41598-025-08703-x.
2
Facial Emotion Recognition of 16 Distinct Emotions From Smartphone Videos: Comparative Study of Machine Learning and Human Performance.基于智能手机视频的16种不同情绪的面部表情识别:机器学习与人类表现的对比研究
J Med Internet Res. 2025 Jul 2;27:e68942. doi: 10.2196/68942.
3
Systematic Review of Emotion Detection with Computer Vision and Deep Learning.基于计算机视觉和深度学习的情绪检测系统评价综述。
Sensors (Basel). 2024 May 28;24(11):3484. doi: 10.3390/s24113484.
4
An intelligent emotion prediction system using improved sand cat optimization technique based on EEG signals.一种基于脑电信号、采用改进沙猫优化技术的智能情绪预测系统。
Sci Rep. 2025 Mar 13;15(1):8782. doi: 10.1038/s41598-025-89904-2.
5
A Deep Neural Network Framework for Dynamic Two-Handed Indian Sign Language Recognition in Hearing and Speech-Impaired Communities.用于听力和言语障碍社区动态双手印度手语识别的深度神经网络框架
Sensors (Basel). 2025 Jun 11;25(12):3652. doi: 10.3390/s25123652.
6
Feature and classifier-level domain adaptation in DistilHuBERT for cross-corpus speech emotion recognition.用于跨语料库语音情感识别的DistilHuBERT中的特征和分类器级域适应
Comput Biol Med. 2025 Aug;194:110510. doi: 10.1016/j.compbiomed.2025.110510. Epub 2025 Jun 6.
7
A deep learning approach to direct immunofluorescence pattern recognition in autoimmune bullous diseases.深度学习方法在自身免疫性大疱性疾病中的直接免疫荧光模式识别。
Br J Dermatol. 2024 Jul 16;191(2):261-266. doi: 10.1093/bjd/ljae142.
8
Enhanced AlexNet with Gabor and Local Binary Pattern Features for Improved Facial Emotion Recognition.用于改进面部表情识别的具有Gabor和局部二值模式特征的增强型AlexNet
Sensors (Basel). 2025 Jun 19;25(12):3832. doi: 10.3390/s25123832.
9
Cross-corpus speech emotion recognition with transformers: Leveraging handcrafted features and data augmentation.基于 Transformer 的跨语料库语音情感识别:利用手工特征和数据增强。
Comput Biol Med. 2024 Sep;179:108841. doi: 10.1016/j.compbiomed.2024.108841. Epub 2024 Jul 12.
10
Development and Validation of a Convolutional Neural Network Model to Predict a Pathologic Fracture in the Proximal Femur Using Abdomen and Pelvis CT Images of Patients With Advanced Cancer.利用晚期癌症患者腹部和骨盆 CT 图像建立卷积神经网络模型预测股骨近端病理性骨折的研究
Clin Orthop Relat Res. 2023 Nov 1;481(11):2247-2256. doi: 10.1097/CORR.0000000000002771. Epub 2023 Aug 23.

本文引用的文献

1
Two-Way Feature Extraction for Speech Emotion Recognition Using Deep Learning.基于深度学习的语音情感识别的双向特征提取。
Sensors (Basel). 2022 Mar 19;22(6):2378. doi: 10.3390/s22062378.
2
Teamwork Optimization Algorithm: A New Optimization Approach for Function Minimization/Maximization.团队合作优化算法:一种用于函数最小化/最大化的新型优化方法。
Sensors (Basel). 2021 Jul 3;21(13):4567. doi: 10.3390/s21134567.