Suppr超能文献

Detecting paralinguistic events in audio stream using context in features and probabilistic decisions.

作者信息

Gupta Rahul, Audhkhasi Kartik, Lee Sungbok, Narayanan Shrikanth

机构信息

Signal Analysis and Interpretation Laboratory (SAIL), University of Southern California, 3710 McClintock Avenue, Los Angeles, CA 90089, USA.

IBM Thomas J Watson Research Center, 1101 Kitchawan Rd, Yorktown Heights, NY 10598, USA.

出版信息

Comput Speech Lang. 2016 Mar;36:72-92. doi: 10.1016/j.csl.2015.08.003. Epub 2015 Sep 11.

Abstract

Non-verbal communication involves encoding, transmission and decoding of non-lexical cues and is realized using vocal (e.g. prosody) or visual (e.g. gaze, body language) channels during conversation. These cues perform the function of maintaining conversational flow, expressing emotions, and marking personality and interpersonal attitude. In particular, non-verbal cues in speech such as paralanguage and non-verbal vocal events (e.g. laughters, sighs, cries) are used to nuance meaning and convey emotions, mood and attitude. For instance, laughters are associated with affective expressions while fillers (e.g. um, ah, um) are used to hold floor during a conversation. In this paper we present an automatic non-verbal vocal events detection system focusing on the detect of laughter and fillers. We extend our system presented during Interspeech 2013 Social Signals Sub-challenge (that was the winning entry in the challenge) for frame-wise event detection and test several schemes for incorporating local context during detection. Specifically, we incorporate context at two separate levels in our system: (i) the raw frame-wise features and, (ii) the output decisions. Furthermore, our system processes the output probabilities based on a few heuristic rules in order to reduce erroneous frame-based predictions. Our overall system achieves an Area Under the Receiver Operating Characteristics curve of 95.3% for detecting laughters and 90.4% for fillers on the test set drawn from the data specifications of the Interspeech 2013 Social Signals Sub-challenge. We perform further analysis to understand the interrelation between the features and obtained results. Specifically, we conduct a feature sensitivity analysis and correlate it with each feature's stand alone performance. The observations suggest that the trained system is more sensitive to a feature carrying higher discriminability with implications towards a better system design.

摘要

相似文献

1
Detecting paralinguistic events in audio stream using context in features and probabilistic decisions.
Comput Speech Lang. 2016 Mar;36:72-92. doi: 10.1016/j.csl.2015.08.003. Epub 2015 Sep 11.
2
Developmental changes in sensitivity to vocal paralanguage.
Dev Sci. 2000 May;3(2):148-162. doi: 10.1111/1467-7687.00108. Epub 2001 Dec 25.
3
Paralinguistic singing attribute recognition using supervised machine learning for describing the classical tenor solo singing voice in vocal pedagogy.
EURASIP J Audio Speech Music Process. 2022;2022(1):8. doi: 10.1186/s13636-022-00240-z. Epub 2022 Apr 15.
4
ACOUSTICALLY-DRIVEN PHONEME REMOVAL THAT PRESERVES VOCAL AFFECT CUES.
Proc IEEE Int Conf Acoust Speech Signal Process. 2023 Jun;2023. doi: 10.1109/icassp49357.2023.10095942. Epub 2023 May 5.
5
'Should we laugh?' Acoustic features of (in)voluntary laughters in spontaneous conversations.
Cogn Process. 2024 Feb;25(1):89-106. doi: 10.1007/s10339-023-01168-8. Epub 2023 Nov 23.
7
Detecting sarcasm from paralinguistic cues: anatomic and cognitive correlates in neurodegenerative disease.
Neuroimage. 2009 Oct 1;47(4):2005-15. doi: 10.1016/j.neuroimage.2009.05.077. Epub 2009 Jun 6.
9
Toddler negative emotion expression and parent-toddler verbal conversation: Evidence from daylong recordings.
Infant Behav Dev. 2022 May;67:101711. doi: 10.1016/j.infbeh.2022.101711. Epub 2022 Mar 26.
10
Prosody Dominates Over Semantics in Emotion Word Processing: Evidence From Cross-Channel and Cross-Modal Stroop Effects.
J Speech Lang Hear Res. 2020 Mar 23;63(3):896-912. doi: 10.1044/2020_JSLHR-19-00258. Epub 2020 Mar 18.

本文引用的文献

1
Analysis of engagement behavior in children during dyadic interactions using prosodic cues.
Comput Speech Lang. 2016 May;37:47-66. doi: 10.1016/j.csl.2015.09.003. Epub 2015 Oct 23.
2
The psychologist as an interlocutor in autism spectrum disorder assessment: insights from a study of spontaneous prosody.
J Speech Lang Hear Res. 2014 Aug;57(4):1162-77. doi: 10.1044/2014_JSLHR-S-13-0062.
3
Take a deep breath: the relief effect of spontaneous and instructed sighs.
Physiol Behav. 2010 Aug 4;101(1):67-73. doi: 10.1016/j.physbeh.2010.04.015. Epub 2010 Apr 24.
4
Face recognition: a convolutional neural-network approach.
IEEE Trans Neural Netw. 1997;8(1):98-113. doi: 10.1109/72.554195.
5
Automatic acoustic synthesis of human-like laughter.
J Acoust Soc Am. 2007 Jan;121(1):527-35. doi: 10.1121/1.2390679.
6
Acoustic analysis of the infant cry: classical and new methods.
Conf Proc IEEE Eng Med Biol Soc. 2004;2006:313-6. doi: 10.1109/IEMBS.2004.1403155.
7
The Autism Diagnostic Observation Schedule: revised algorithms for improved diagnostic validity.
J Autism Dev Disord. 2007 Apr;37(4):613-27. doi: 10.1007/s10803-006-0280-1. Epub 2006 Dec 16.
8
In rats, sighs correlate with relief.
Physiol Behav. 2005 Aug 7;85(5):598-602. doi: 10.1016/j.physbeh.2005.06.008.
9
The acoustic features of human laughter.
J Acoust Soc Am. 2001 Sep;110(3 Pt 1):1581-97. doi: 10.1121/1.1391244.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验