Amiriparian Shahin, Han Jing, Schmitt Maximilian, Baird Alice, Mallol-Ragolta Adria, Milling Manuel, Gerczuk Maurice, Schuller Björn
ZD.B Chair of Embedded Intelligence for Health Care and Wellbeing, University of Augsburg, Augsburg, Germany.
Group on Language, Audio & Music, Imperial College London, London, United Kingdom.
Front Robot AI. 2019 Nov 8;6:116. doi: 10.3389/frobt.2019.00116. eCollection 2019.
During both positive and negative dyadic exchanges, individuals will often unconsciously imitate their partner. A substantial amount of research has been made on this phenomenon, and such studies have shown that synchronization between communication partners can improve interpersonal relationships. Automatic computational approaches for recognizing synchrony are still in their infancy. In this study, we extend on previous work in which we applied a novel method utilizing hand-crafted low-level acoustic descriptors and autoencoders (AEs) to analyse synchrony in the speech domain. For this purpose, a database consisting of 394 in-the-wild speakers from six different cultures, is used. For each speaker in the dyadic exchange, two AEs are implemented. Post the training phase, the acoustic features for one of the speakers is tested using the AE trained on their dyadic partner. In this same way, we also explore the benefits that deep representations from audio may have, implementing the state-of-the-art Deep Spectrum toolkit. For all speakers at varied time-points during their interaction, the calculation of reconstruction error from the AE trained on their respective dyadic partner is made. The results obtained from this acoustic analysis are then compared with the linguistic experiments based on word counts and word embeddings generated by our approach. The results demonstrate that there is a degree of synchrony during all interactions. We also find that, this degree varies across the 6 cultures found in the investigated database. These findings are further substantiated through the use of 4,096 dimensional Deep Spectrum features.
在积极和消极的二元交流中,个体往往会无意识地模仿他们的伙伴。针对这一现象已经进行了大量研究,这些研究表明交流伙伴之间的同步可以改善人际关系。用于识别同步的自动计算方法仍处于起步阶段。在本研究中,我们扩展了之前的工作,在之前的工作中我们应用了一种利用手工制作的低级声学描述符和自动编码器(AE)来分析语音领域同步性的新方法。为此,使用了一个由来自六种不同文化的394名自然环境中的说话者组成的数据库。对于二元交流中的每个说话者,实现两个自动编码器。在训练阶段之后,使用在其二元伙伴上训练的自动编码器对其中一个说话者的声学特征进行测试。同样,我们还探索音频深度表征可能带来的好处,应用了最先进的深度频谱工具包。对于所有说话者在互动过程中的不同时间点,计算在其各自二元伙伴上训练的自动编码器的重构误差。然后将从这种声学分析中获得的结果与基于我们方法生成的词数和词嵌入的语言实验结果进行比较。结果表明,在所有互动过程中都存在一定程度的同步。我们还发现,这种程度在被调查数据库中的6种文化中有所不同。通过使用4096维深度频谱特征,这些发现得到了进一步证实。