• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

在计算副语言任务中使用混合 HMM/DNN 嵌入提取器模型。

Using Hybrid HMM/DNN Embedding Extractor Models in Computational Paralinguistic Tasks.

机构信息

Institute of Informatics, University of Szeged, H-6720 Szeged, Hungary.

ELKH-SZTE Research Group on Artificial Intelligence, H-6720 Szeged, Hungary.

出版信息

Sensors (Basel). 2023 May 30;23(11):5208. doi: 10.3390/s23115208.

DOI:10.3390/s23115208
PMID:37299935
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10256007/
Abstract

The field of computational paralinguistics emerged from automatic speech processing, and it covers a wide range of tasks involving different phenomena present in human speech. It focuses on the non-verbal content of human speech, including tasks such as spoken emotion recognition, conflict intensity estimation and sleepiness detection from speech, showing straightforward application possibilities for remote monitoring with acoustic sensors. The two main technical issues present in computational paralinguistics are (1) handling varying-length utterances with traditional classifiers and (2) training models on relatively small corpora. In this study, we present a method that combines automatic speech recognition and paralinguistic approaches, which is able to handle both of these technical issues. That is, we trained a HMM/DNN hybrid acoustic model on a general ASR corpus, which was then used as a source of embeddings employed as features for several paralinguistic tasks. To convert the local embeddings into utterance-level features, we experimented with five different aggregation methods, namely mean, standard deviation, skewness, kurtosis and the ratio of non-zero activations. Our results show that the proposed feature extraction technique consistently outperforms the widely used x-vector method used as the baseline, independently of the actual paralinguistic task investigated. Furthermore, the aggregation techniques could be combined effectively as well, leading to further improvements depending on the task and the layer of the neural network serving as the source of the local embeddings. Overall, based on our experimental results, the proposed method can be considered as a competitive and resource-efficient approach for a wide range of computational paralinguistic tasks.

摘要

计算副语言学领域源于自动语音处理,涵盖了涉及人类语音中不同现象的广泛任务。它专注于人类语音的非语言内容,包括从语音中识别情感、估计冲突强度和检测困倦等任务,为使用声学传感器进行远程监测展示了直接的应用可能性。计算副语言学中的两个主要技术问题是(1)用传统分类器处理长度变化的语音,以及(2)在相对较小的语料库上训练模型。在这项研究中,我们提出了一种结合自动语音识别和副语言学方法的方法,该方法能够处理这两个技术问题。也就是说,我们在一般的 ASR 语料库上训练了一个 HMM/DNN 混合声学模型,然后将其用作几个副语言学任务的嵌入源特征。为了将局部嵌入转换为话语级特征,我们尝试了五种不同的聚合方法,即平均值、标准差、偏度、峰度和非零激活比。我们的结果表明,所提出的特征提取技术在不依赖于所研究的实际副语言任务的情况下,始终优于作为基线的广泛使用的 x-vector 方法。此外,聚合技术也可以有效地结合起来,根据任务和作为局部嵌入源的神经网络的层,进一步提高性能。总的来说,根据我们的实验结果,可以认为该方法是一种具有竞争力和资源效率的方法,适用于广泛的计算副语言学任务。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bc04/10256007/2d1388543a6d/sensors-23-05208-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bc04/10256007/ecaff3184599/sensors-23-05208-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bc04/10256007/dfeb2124e8e1/sensors-23-05208-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bc04/10256007/7400c6070469/sensors-23-05208-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bc04/10256007/f57832d4e65c/sensors-23-05208-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bc04/10256007/14ea0598efb5/sensors-23-05208-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bc04/10256007/7804f07e65c9/sensors-23-05208-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bc04/10256007/7e0bedaf726c/sensors-23-05208-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bc04/10256007/b3fc8f12324a/sensors-23-05208-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bc04/10256007/2d1388543a6d/sensors-23-05208-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bc04/10256007/ecaff3184599/sensors-23-05208-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bc04/10256007/dfeb2124e8e1/sensors-23-05208-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bc04/10256007/7400c6070469/sensors-23-05208-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bc04/10256007/f57832d4e65c/sensors-23-05208-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bc04/10256007/14ea0598efb5/sensors-23-05208-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bc04/10256007/7804f07e65c9/sensors-23-05208-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bc04/10256007/7e0bedaf726c/sensors-23-05208-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bc04/10256007/b3fc8f12324a/sensors-23-05208-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bc04/10256007/2d1388543a6d/sensors-23-05208-g009.jpg

相似文献

1
Using Hybrid HMM/DNN Embedding Extractor Models in Computational Paralinguistic Tasks.在计算副语言任务中使用混合 HMM/DNN 嵌入提取器模型。
Sensors (Basel). 2023 May 30;23(11):5208. doi: 10.3390/s23115208.
2
Domain Adaptation with Augmented Data by Deep Neural Network Based Method Using Re-Recorded Speech for Automatic Speech Recognition in Real Environment.基于深度神经网络的扩充数据域自适应方法在真实环境下的自动语音识别中的再录音语音应用。
Sensors (Basel). 2022 Dec 16;22(24):9945. doi: 10.3390/s22249945.
3
Machine learning based sample extraction for automatic speech recognition using dialectal Assamese speech.基于机器学习的方言阿萨姆语语音自动识别样本提取。
Neural Netw. 2016 Jun;78:97-111. doi: 10.1016/j.neunet.2015.12.010. Epub 2015 Dec 30.
4
A comprehensive study on bilingual and multilingual speech emotion recognition using a two-pass classification scheme.使用双通分类方案进行双语和多语语音情感识别的综合研究。
PLoS One. 2019 Aug 15;14(8):e0220386. doi: 10.1371/journal.pone.0220386. eCollection 2019.
5
Multi-resolution speech analysis for automatic speech recognition using deep neural networks: Experiments on TIMIT.基于深度神经网络的语音识别的多分辨率语音分析:在 TIMIT 上的实验。
PLoS One. 2018 Oct 10;13(10):e0205355. doi: 10.1371/journal.pone.0205355. eCollection 2018.
6
The Filtering Effect of Face Masks in their Detection from Speech.口罩对语音检测的过滤效果。
Annu Int Conf IEEE Eng Med Biol Soc. 2021 Nov;2021:2079-2082. doi: 10.1109/EMBC46164.2021.9630634.
7
Utterance Level Feature Aggregation with Deep Metric Learning for Speech Emotion Recognition.基于深度度量学习的话语级特征聚合在语音情感识别中的研究
Sensors (Basel). 2021 Jun 20;21(12):4233. doi: 10.3390/s21124233.
8
A transfer learning approach to goodness of pronunciation based automatic mispronunciation detection.一种基于发音优劣的迁移学习方法用于自动错误发音检测。
J Acoust Soc Am. 2017 Nov;142(5):3165. doi: 10.1121/1.5011159.
9
Finnish parliament ASR corpus: Analysis, benchmarks and statistics.芬兰议会ASR语料库:分析、基准与统计数据。
Lang Resour Eval. 2023 Mar 27:1-26. doi: 10.1007/s10579-023-09650-7.
10
Ensemble learning with speaker embeddings in multiple speech task stimuli for depression detection.在用于抑郁症检测的多语音任务刺激中结合说话人嵌入的集成学习。
Front Neurosci. 2023 Mar 23;17:1141621. doi: 10.3389/fnins.2023.1141621. eCollection 2023.

引用本文的文献

1
Research on Pig Sound Recognition Based on Deep Neural Network and Hidden Markov Models.基于深度神经网络和隐马尔可夫模型的猪声识别研究。
Sensors (Basel). 2024 Feb 16;24(4):1269. doi: 10.3390/s24041269.
2
Special Issue on Acoustic Sensors and Their Applications (Vol. 1).声学传感器及其应用特刊(第1卷)。
Sensors (Basel). 2023 Sep 7;23(18):7726. doi: 10.3390/s23187726.

本文引用的文献

1
Automatic Detection of Alzheimer's Disease Using Spontaneous Speech Only.仅使用自发语音自动检测阿尔茨海默病。
Interspeech. 2021 Aug-Sep;2021:3830-3834. doi: 10.21437/interspeech.2021-2002.
2
X-Vectors: New Quantitative Biomarkers for Early Parkinson's Disease Detection From Speech.X向量:用于早期帕金森病语音检测的新型定量生物标志物。
Front Neuroinform. 2021 Feb 19;15:578369. doi: 10.3389/fninf.2021.578369. eCollection 2021.
3
I Hear You Eat and Speak: Automatic Recognition of Eating Condition and Food Type, Use-Cases, and Impact on ASR Performance.
我听到你进食和说话:进食状态和食物类型的自动识别、用例及其对自动语音识别性能的影响
PLoS One. 2016 May 13;11(5):e0154486. doi: 10.1371/journal.pone.0154486. eCollection 2016.
4
Vocal symptoms and acoustic changes in relation to the expanded disability status scale, duration and stage of disease in patients with multiple sclerosis.多发性硬化症患者的扩展残疾状况量表、疾病持续时间和阶段与发声症状和声学变化的关系。
Eur Arch Otorhinolaryngol. 2009 Nov;266(11):1759-65. doi: 10.1007/s00405-009-1003-y. Epub 2009 Jun 10.
5
Long short-term memory.长短期记忆
Neural Comput. 1997 Nov 15;9(8):1735-80. doi: 10.1162/neco.1997.9.8.1735.