Department of IT, Multimedia and Telecommunications (IMT), Universitat Oberta de Catalunya, 08018 Barcelona, Spain.
Sensors (Basel). 2022 Jul 13;22(14):5245. doi: 10.3390/s22145245.
In supervised learning, the generalization capabilities of trained models are based on the available annotations. Usually, multiple annotators are asked to annotate the dataset samples and, then, the common practice is to aggregate the different annotations by computing average scores or majority voting, and train and test models on these aggregated annotations. However, this practice is not suitable for all types of problems, especially when the subjective information of each annotator matters for the task modeling. For example, emotions experienced while watching a video or evoked by other sources of content, such as news headlines, are subjective: different individuals might perceive or experience different emotions. The aggregated annotations in emotion modeling may lose the subjective information and actually represent an annotation bias. In this paper, we highlight the weaknesses of models that are trained on aggregated annotations for modeling tasks related to affect. More concretely, we compare two generic Deep Learning architectures: a Single-Task (ST) architecture and a Multi-Task (MT) architecture. While the ST architecture models single emotional perception each time, the MT architecture jointly models every single annotation and the aggregated annotations at once. Our results show that the MT approach can more accurately model every single annotation and the aggregated annotations when compared to methods that are directly trained on the aggregated annotations. Furthermore, the MT approach achieves state-of-the-art results on the COGNIMUSE, IEMOCAP, and SemEval_2007 benchmarks.
在监督学习中,训练模型的泛化能力基于可用的注释。通常,会要求多个注释者对数据集样本进行注释,然后,常见的做法是通过计算平均分数或多数投票来聚合不同的注释,并在这些聚合的注释上训练和测试模型。然而,这种做法并不适用于所有类型的问题,特别是当每个注释者的主观信息对任务建模很重要时。例如,观看视频时体验到的情绪或其他内容来源(如新闻标题)引发的情绪是主观的:不同的人可能会感知或体验到不同的情绪。在情绪建模中,聚合的注释可能会丢失主观信息,实际上代表了注释偏差。在本文中,我们强调了在与情感相关的建模任务中,对聚合注释进行训练的模型的弱点。更具体地说,我们比较了两种通用的深度学习架构:单任务 (ST) 架构和多任务 (MT) 架构。虽然 ST 架构每次都对单一情感感知进行建模,但 MT 架构同时对每个单一注释和聚合注释进行建模。我们的结果表明,与直接在聚合注释上训练的方法相比,MT 方法可以更准确地对每个单一注释和聚合注释进行建模。此外,MT 方法在 COGNIMUSE、IEMOCAP 和 SemEval_2007 基准测试中取得了最先进的结果。