Marczak-Czajka Agnieszka, Redgrave Timothy, Mitcheff Mahsa, Villano Michael, Czajka Adam
Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN, United States.
Department of Psychology, University of Notre Dame, Notre Dame, IN, United States.
Front Psychol. 2024 Dec 24;15:1509392. doi: 10.3389/fpsyg.2024.1509392. eCollection 2024.
While the fact that visual stimuli synthesized by Artificial Neural Networks (ANN) may evoke emotional reactions is documented, the precise mechanisms that connect the strength and type of such reactions with the ways of how ANNs are used to synthesize visual stimuli are yet to be discovered. Understanding these mechanisms allows for designing methods that synthesize images attenuating or enhancing selected emotional states, which may provide unobtrusive and widely-applicable treatment of mental dysfunctions and disorders.
The Convolutional Neural Network (CNN), a type of ANN used in computer vision tasks which models the ways humans solve visual tasks, was applied to synthesize ("dream" or "hallucinate") images with no semantic content to maximize activations of neurons in precisely-selected layers in the CNN. The evoked emotions of 150 human subjects observing these images were self-reported on a two-dimensional scale (arousal and valence) utilizing self-assessment manikin (SAM) figures. Correlations between arousal and valence values and image visual properties (e.g., color, brightness, clutter feature congestion, and clutter sub-band entropy) as well as the position of the CNN's layers stimulated to obtain a given image were calculated.
Synthesized images that maximized activations of some of the CNN layers led to significantly higher or lower arousal and valence levels compared to average subject's reactions. Multiple linear regression analysis found that a small set of selected image global visual features (hue, feature congestion, and sub-band entropy) are significant predictors of the measured arousal, however no statistically significant dependencies were found between image global visual features and the measured valence.
This study demonstrates that the specific method of synthesizing images by maximizing small and precisely-selected parts of the CNN used in this work may lead to synthesis of visual stimuli that enhance or attenuate emotional reactions. This method paves the way for developing tools that stimulate, in a non-invasive way, to support wellbeing (manage stress, enhance mood) and to assist patients with certain mental conditions by complementing traditional methods of therapeutic interventions.
虽然有文献记载人工神经网络(ANN)合成的视觉刺激可能会引发情绪反应,但将此类反应的强度和类型与ANN用于合成视觉刺激的方式联系起来的精确机制尚未被发现。了解这些机制有助于设计出能够合成图像以减弱或增强特定情绪状态的方法,这可能为精神功能障碍和紊乱提供不引人注意且广泛适用的治疗方法。
卷积神经网络(CNN)是一种用于计算机视觉任务的ANN,它模拟人类解决视觉任务的方式,被用于合成(“做梦”或“产生幻觉”)无语义内容的图像,以最大限度地激活CNN中精确选择层的神经元。150名观察这些图像的人类受试者所引发的情绪通过使用自我评估人体模型(SAM)在二维尺度(唤醒度和效价)上进行自我报告。计算了唤醒度和效价值与图像视觉属性(如颜色、亮度、杂波特征拥塞和杂波子带熵)以及为获得给定图像而被刺激的CNN层的位置之间的相关性。
与受试者的平均反应相比,最大化某些CNN层激活的合成图像导致了显著更高或更低的唤醒度和效价水平。多元线性回归分析发现,一小部分选定的图像全局视觉特征(色调、特征拥塞和子带熵)是所测量唤醒度的重要预测指标,然而在图像全局视觉特征与所测量的效价之间未发现统计学上的显著相关性。
本研究表明,通过最大化本工作中使用的CNN的小且精确选择的部分来合成图像的特定方法可能会导致合成出增强或减弱情绪反应的视觉刺激。这种方法为开发以非侵入性方式刺激以支持幸福感(管理压力、改善情绪)并通过补充传统治疗干预方法来协助患有某些精神疾病的患者的工具铺平了道路。