Ruarte Gonzalo, Bujia Gaston, Care Damián, Ison Matias Julian, Kamienkowski Juan Esteban
Laboratorio de Inteligencia Artificial Aplicada (LIAA), Instituto de Ciencias de la Computación (ICC), CONICET - Universidad de Buenos Aires, Buenos Aires, Argentina.
Departamento de Computación, Facultad de Ciencias Exactas y Naturales, Universidad de Buenos Aires, Buenos Aires, Argentina.
Sci Rep. 2025 May 12;15(1):16482. doi: 10.1038/s41598-025-00272-3.
Visual search is crucial in daily human interaction with the environment. Hybrid search extends this by requiring observers to find any item from a given set. Recently, a few models were proposed to simulate human eye movements in visual search tasks within natural scenes, but none were implemented for Hybrid search under similar conditions. We present an enhanced neural network Entropy Limit Minimization (nnELM) model, grounded in a Bayesian framework and signal detection theory, and the Hybrid Search Eye Movements (HSEM) Dataset, containing thousands of human eye movements during hybrid tasks. A key Hybrid search challenge is that participants have to look for different objects at the same time. To address this, we developed several strategies involving the posterior probability distributions after each fixation. Adjusting peripheral visibility improved early-stage efficiency, aligning it with human behavior. Limiting the model's memory reduced success in longer searches, mirroring human performance. We validated these improvements by comparing our model with a held-out set within the HSEM and with other models in a separate visual search benchmark. Overall, the new nnELM model not only handles Hybrid search in natural scenes but also closely replicates human behavior, advancing our understanding of search processes while maintaining interpretability.
视觉搜索在人类与环境的日常交互中至关重要。混合搜索在此基础上进行了扩展,要求观察者从给定集合中找到任何物品。最近,有人提出了一些模型来模拟自然场景中视觉搜索任务中的人类眼球运动,但在类似条件下没有一个模型用于混合搜索。我们提出了一种基于贝叶斯框架和信号检测理论的增强神经网络熵极限最小化(nnELM)模型,以及混合搜索眼球运动(HSEM)数据集,该数据集包含混合任务期间的数千次人类眼球运动。混合搜索的一个关键挑战是参与者必须同时寻找不同的物体。为了解决这个问题,我们开发了几种策略,涉及每次注视后的后验概率分布。调整周边可见性提高了早期效率,使其与人类行为保持一致。限制模型的记忆会降低较长搜索中的成功率,这与人类表现相符。我们通过将我们的模型与HSEM中的一个保留集以及单独视觉搜索基准中的其他模型进行比较,验证了这些改进。总体而言,新的nnELM模型不仅能够处理自然场景中的混合搜索,还能紧密复制人类行为,在保持可解释性的同时,推进了我们对搜索过程的理解。