Chen Zhuohao, Singla Karan, Gibson James, Can Dogan, Imel Zac E, Atkins David C, Georgiou Panayiotis, Narayanan Shrikanth
Signal Analysis and Interpretation Lab, University of Southern California, Los Angeles, CA, USA.
Department Educational Psychology, University of Utah, Salt Lake City, UT, USA.
Proc IEEE Int Conf Acoust Speech Signal Process. 2019 May;2019:6605-6609. doi: 10.1109/icassp.2019.8682885. Epub 2019 Apr 17.
In this work we address the problem of joint prosodic and lexical behavioral annotation for addiction counseling. We expand on past work that employed Recurrent Neural Networks (RNNs) on multimodal features by grouping and classifying subsets of classes. We propose two implementations: One is hierarchical classification, which uses the behavior confusion matrix to cluster similar classes and makes the prediction based on a tree structure. The second is a graph-based method which uses the result of the original classification just to find a certain subset of the most probable candidate classes, where the candidate sets of different predicted classes are determined by the class confusions. We make a second prediction with simpler classifier to discriminate the candidates. The evaluation shows that the strict hierarchical approach degrades performance, likely due to error propagation, while the graph-based hierarchy provides significant gains.
在这项工作中,我们解决了成瘾咨询中韵律和词汇行为联合注释的问题。我们在过去将循环神经网络(RNN)用于多模态特征的工作基础上进行扩展,通过对类别的子集进行分组和分类来实现。我们提出了两种实现方法:一种是层次分类,它使用行为混淆矩阵对相似的类进行聚类,并基于树结构进行预测。第二种是基于图的方法,它仅使用原始分类的结果来找到最可能的候选类的某个子集,其中不同预测类别的候选集由类混淆确定。我们使用更简单的分类器进行第二次预测以区分候选类。评估表明,严格的层次方法会降低性能,可能是由于误差传播,而基于图的层次结构则带来了显著的提升。