迈向端到端学习：从心理治疗对话中的口语话语预测行为代码

Towards End-2-end Learning for Predicting Behavior Codes from Spoken Utterances in Psychotherapy Conversations.

作者信息

Singla Karan, Chen Zhuohao, Atkins David C, Narayanan Shrikanth

机构信息

University of Southern California, Los Angeles, USA.

University of Washington, Seattle, WA, USA.

出版信息

Proc Conf Assoc Comput Linguist Meet. 2020 Jul;2020:3797-3803. doi: 10.18653/v1/2020.acl-main.351.

DOI:10.18653/v1/2020.acl-main.351

PMID:36751434

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9901279/

Abstract

Spoken language understanding tasks usually rely on pipelines involving complex processing blocks such as voice activity detection, speaker diarization and Automatic speech recognition (ASR). We propose a novel framework for predicting utterance level labels directly from speech features, thus removing the dependency on first generating transcripts, and transcription free behavioral coding. Our classifier uses a pretrained Speech-2-Vector encoder as bottleneck to generate word-level representations from speech features. This pre-trained encoder learns to encode speech features for a word using an objective similar to Word2Vec. Our proposed approach just uses speech features and word segmentation information for predicting spoken utterance-level target labels. We show that our model achieves competitive results to other state-of-the-art approaches which use transcribed text for the task of predicting psychotherapy-relevant behavior codes.

摘要

口语理解任务通常依赖于包含复杂处理模块的流水线，如语音活动检测、说话人分割和自动语音识别（ASR）。我们提出了一种新颖的框架，可直接根据语音特征预测话语级标签，从而消除了对先生成转录本以及无转录行为编码的依赖。我们的分类器使用预训练的语音到向量编码器作为瓶颈，从语音特征生成词级表示。这个预训练的编码器学习使用类似于Word2Vec的目标为一个单词编码语音特征。我们提出的方法仅使用语音特征和词分割信息来预测口语话语级目标标签。我们表明，我们的模型与其他使用转录文本进行预测心理治疗相关行为代码任务的最先进方法相比，取得了具有竞争力的结果。

相似文献

Towards End-2-end Learning for Predicting Behavior Codes from Spoken Utterances in Psychotherapy Conversations.

Proc Conf Assoc Comput Linguist Meet. 2020 Jul;2020:3797-3803. doi: 10.18653/v1/2020.acl-main.351.

Confusion2Vec 2.0: Enriching ambiguous spoken language representations with subwords.

PLoS One. 2022 Mar 4;17(3):e0264488. doi: 10.1371/journal.pone.0264488. eCollection 2022.

Social Reminiscence in Older Adults' Everyday Conversations: Automated Detection Using Natural Language Processing and Machine Learning.

J Med Internet Res. 2020 Sep 15;22(9):e19133. doi: 10.2196/19133.

Deep bottleneck features for spoken language identification.

PLoS One. 2014 Jul 1;9(7):e100795. doi: 10.1371/journal.pone.0100795. eCollection 2014.

Emphasizing unseen words: New vocabulary acquisition for end-to-end speech recognition.

Neural Netw. 2023 Apr;161:494-504. doi: 10.1016/j.neunet.2023.01.027. Epub 2023 Feb 10.

Real-time multilingual speech recognition and speaker diarization system based on Whisper segmentation.

PeerJ Comput Sci. 2024 Mar 29;10:e1973. doi: 10.7717/peerj-cs.1973. eCollection 2024.

Feature Fusion Strategies for End-to-End Evaluation of Cognitive Behavior Therapy Sessions.

Annu Int Conf IEEE Eng Med Biol Soc. 2021 Nov;2021:1836-1839. doi: 10.1109/EMBC46164.2021.9629694.

INVESTIGATING THE EFFECTS OF WORD SUBSTITUTION ERRORS ON SENTENCE EMBEDDINGS.

Proc IEEE Int Conf Acoust Speech Signal Process. 2019 May;2019:7315-7319. doi: 10.1109/icassp.2019.8683367. Epub 2019 Apr 17.

Phonetic variability constrained bottleneck features for joint speaker recognition and physical task stress detection.

J Acoust Soc Am. 2020 Nov;148(5):2912. doi: 10.1121/10.0002455.

"Rate My Therapist": Automated Detection of Empathy in Drug and Alcohol Counseling via Speech and Language Processing.

PLoS One. 2015 Dec 2;10(12):e0143055. doi: 10.1371/journal.pone.0143055. eCollection 2015.

引用本文的文献

Automated evaluation of psychotherapy skills using speech and language technologies.

Behav Res Methods. 2022 Apr;54(2):690-711. doi: 10.3758/s13428-021-01623-4. Epub 2021 Aug 3.

本文引用的文献

Multi-label Multi-task Deep Learning for Behavioral Coding.

IEEE Trans Affect Comput. 2022 Jan-Mar;13(1):508-518. doi: 10.1109/taffc.2019.2952113. Epub 2019 Nov 8.

IMPROVING THE PREDICTION OF THERAPIST BEHAVIORS IN ADDICTION COUNSELING BY EXPLOITING CLASS CONFUSIONS.

Proc IEEE Int Conf Acoust Speech Signal Process. 2019 May;2019:6605-6609. doi: 10.1109/icassp.2019.8682885. Epub 2019 Apr 17.

Using Prosodic and Lexical Information for Learning Utterance-level Behaviors in Psychotherapy.

Interspeech. 2018 Sep;2018:3413-3417. doi: 10.21437/interspeech.2018-2551.

A Comparison of Natural Language Processing Methods for Automated Coding of Motivational Interviewing.

J Subst Abuse Treat. 2016 Jun;65:43-50. doi: 10.1016/j.jsat.2016.01.006. Epub 2016 Jan 28.

Long short-term memory.

Neural Comput. 1997 Nov 15;9(8):1735-80. doi: 10.1162/neco.1997.9.8.1735.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

迈向端到端学习：从心理治疗对话中的口语话语预测行为代码

Towards End-2-end Learning for Predicting Behavior Codes from Spoken Utterances in Psychotherapy Conversations.

作者信息

Singla Karan, Chen Zhuohao, Atkins David C, Narayanan Shrikanth

机构信息

University of Southern California, Los Angeles, USA.

University of Washington, Seattle, WA, USA.

出版信息

Proc Conf Assoc Comput Linguist Meet. 2020 Jul;2020:3797-3803. doi: 10.18653/v1/2020.acl-main.351.

DOI:10.18653/v1/2020.acl-main.351

PMID:36751434

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9901279/

Abstract

摘要

迈向端到端学习：从心理治疗对话中的口语话语预测行为代码

Towards End-2-end Learning for Predicting Behavior Codes from Spoken Utterances in Psychotherapy Conversations.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

迈向端到端学习：从心理治疗对话中的口语话语预测行为代码

Towards End-2-end Learning for Predicting Behavior Codes from Spoken Utterances in Psychotherapy Conversations.

作者信息

机构信息

出版信息