Rodrigues Ana, Sousa Bruna, Cardoso Amílcar, Machado Penousal
Department of Informatics Engineering, Centre for Informatics and Systems of the University of Coimbra, University of Coimbra, 3004-531 Coimbra, Portugal.
Intelligent Systems Associate Laboratory (LASI), University of Minho, 4800-058 Guimarães, Portugal.
Entropy (Basel). 2022 Nov 22;24(12):1706. doi: 10.3390/e24121706.
The development of computational artifacts to study cross-modal associations has been a growing research topic, as they allow new degrees of abstraction. In this context, we propose a novel approach to the computational exploration of relationships between music and abstract images, grounded by findings from cognitive sciences (emotion and perception). Due to the problem's high-level nature, we rely on evolutionary programming techniques to evolve this audio-visual dialogue. To articulate the complexity of the problem, we develop a framework with four modules: (i) vocabulary set, (ii) music generator, (iii) image generator, and (iv) evolutionary engine. We test our approach by evolving a given music set to a corresponding set of images, steered by the expression of four emotions (angry, calm, happy, sad). Then, we perform preliminary user tests to evaluate if the user's perception is consistent with the system's expression. Results suggest an agreement between the user's emotional perception of the music-image pairs and the system outcomes, favoring the integration of cognitive science knowledge. We also discuss the benefit of employing evolutionary strategies, such as genetic programming on multi-modal problems of a creative nature. Overall, this research contributes to a better understanding of the foundations of auditory-visual associations mediated by emotions and perception.
开发用于研究跨模态关联的计算工件一直是一个不断发展的研究课题,因为它们允许新的抽象程度。在这种背景下,我们基于认知科学(情感和感知)的研究结果,提出了一种全新的方法来对音乐与抽象图像之间的关系进行计算探索。由于该问题具有高层次的性质,我们依靠进化编程技术来推动这种视听对话。为了清晰阐述问题的复杂性,我们开发了一个包含四个模块的框架:(i)词汇集,(ii)音乐生成器,(iii)图像生成器,以及(iv)进化引擎。我们通过将给定的音乐集演化为相应的图像集来测试我们的方法,这一过程由四种情感(愤怒、平静、快乐、悲伤)的表达来引导。然后,我们进行初步的用户测试,以评估用户的感知是否与系统的表达一致。结果表明,用户对音乐 - 图像对的情感感知与系统结果之间存在一致性,这有利于认知科学知识的整合。我们还讨论了采用进化策略(如遗传编程)处理具有创造性的多模态问题的益处。总体而言,这项研究有助于更好地理解由情感和感知介导的视听关联的基础。