ETIS, UMR 8051, ENSEA, CY Cergy Paris Université, CNRS, Cergy-Pontoise, France.
Service de Psychiatrie de l'Enfant et de l'Adolescent, Hôpital Pitié-Salpêtrière, AP-HP, Paris, France.
Sci Rep. 2024 Sep 3;14(1):20492. doi: 10.1038/s41598-024-69245-2.
A social individual needs to effectively manage the amount of complex information in his or her environment relative to his or her own purpose to obtain relevant information. This paper presents a neural architecture aiming to reproduce attention mechanisms (alerting/orienting/selecting) that are efficient in humans during audiovisual tasks in robots. We evaluated the system based on its ability to identify relevant sources of information on faces of subjects emitting vowels. We propose a developmental model of audio-visual attention (MAVA) combining Hebbian learning and a competition between saliency maps based on visual movement and audio energy. MAVA effectively combines bottom-up and top-down information to orient the system toward pertinent areas. The system has several advantages, including online and autonomous learning abilities, low computation time and robustness to environmental noise. MAVA outperforms other artificial models for detecting speech sources under various noise conditions.
社会个体需要有效地管理其环境中与其自身目的相关的复杂信息量,以获取相关信息。本文提出了一种旨在复制人类在视听任务中注意力机制(警觉/定向/选择)的神经架构。我们根据系统识别发出元音的主体面部相关信息源的能力对系统进行了评估。我们提出了一种结合了赫布学习和基于视觉运动和音频能量的显着性图之间竞争的视听注意的发展模型(MAVA)。MAVA 有效地结合了自下而上和自上而下的信息,使系统能够定位相关区域。该系统具有多种优势,包括在线和自主学习能力、低计算时间和对环境噪声的鲁棒性。MAVA 在各种噪声条件下检测语音源的性能优于其他人工模型。