Gupta Rahul, Audhkhasi Kartik, Jacokes Zach, Rozga Agata, Narayanan Shrikanth
IEEE Trans Affect Comput. 2018 Jan-Mar;9(1):76-89. doi: 10.1109/TAFFC.2016.2592918. Epub 2016 Jul 19.
Studies of time-continuous human behavioral phenomena often rely on ratings from multiple annotators. Since the ground truth of the target construct is often latent, the standard practice is to use ad-hoc metrics (such as averaging annotator ratings). Despite being easy to compute, such metrics may not provide accurate representations of the underlying construct. In this paper, we present a novel method for modeling multiple time series annotations over a continuous variable that computes the ground truth by modeling annotator specific distortions. We condition the ground truth on a set of features extracted from the data and further assume that the annotators provide their ratings as modification of the ground truth, with each annotator having specific distortion tendencies. We train the model using an Expectation-Maximization based algorithm and evaluate it on a study involving natural interaction between a child and a psychologist, to predict confidence ratings of the children's smiles. We compare and analyze the model against two baselines where: (i) the ground truth in considered to be framewise mean of ratings from various annotators and, (ii) each annotator is assumed to bear a distinct time delay in annotation and their annotations are aligned before computing the framewise mean.
对时间连续的人类行为现象的研究通常依赖于多个注释者的评分。由于目标结构的真实情况往往是潜在的,标准做法是使用临时指标(如对注释者评分求平均值)。尽管这些指标易于计算,但可能无法准确反映潜在结构。在本文中,我们提出了一种对连续变量上的多个时间序列注释进行建模的新方法,该方法通过对注释者特定的偏差进行建模来计算真实情况。我们将真实情况基于从数据中提取的一组特征,并且进一步假设注释者将他们的评分作为对真实情况的修改,每个注释者都有特定的偏差倾向。我们使用基于期望最大化的算法训练模型,并在一项涉及儿童与心理学家自然互动的研究中对其进行评估,以预测儿童微笑的置信度评分。我们将该模型与两个基线进行比较和分析,其中:(i)真实情况被视为来自不同注释者评分的逐帧平均值,以及(ii)假设每个注释者在注释时有不同的时间延迟,并且在计算逐帧平均值之前对他们的注释进行对齐。