Zanganeh Momtaz Hassan, Daliri Mohammad Reza
Neuroscience and Neuroengineering Research Lab., Biomedical Engineering Department, Faculty of Electrical Engineering, Iran University of Science and Technology (IUST), Narmak, 16846-13114 Tehran, Iran ; School of Cognitive Sciences (SCS), Institute for Research in Fundamental Research (IPM), Niavaran, P.O. Box 19395-5746, Tehran, Iran.
Cogn Neurodyn. 2016 Feb;10(1):31-47. doi: 10.1007/s11571-015-9357-x. Epub 2015 Oct 7.
In recent years, there has been considerable interest in visual attention models (saliency map of visual attention). These models can be used to predict eye fixation locations, and thus will have many applications in various fields which leads to obtain better performance in machine vision systems. Most of these models need to be improved because they are based on bottom-up computation that does not consider top-down image semantic contents and often does not match actual eye fixation locations. In this study, we recorded the eye movements (i.e., fixations) of fourteen individuals who viewed images which consist natural (e.g., landscape, animal) and man-made (e.g., building, vehicles) scenes. We extracted the fixation locations of eye movements in two image categories. After extraction of the fixation areas (a patch around each fixation location), characteristics of these areas were evaluated as compared to non-fixation areas. The extracted features in each patch included the orientation and spatial frequency. After feature extraction phase, different statistical classifiers were trained for prediction of eye fixation locations by these features. This study connects eye-tracking results to automatic prediction of saliency regions of the images. The results showed that it is possible to predict the eye fixation locations by using of the image patches around subjects' fixation points.
近年来,人们对视觉注意力模型(视觉注意力的显著图)产生了浓厚兴趣。这些模型可用于预测眼睛注视位置,因此在各个领域都有许多应用,从而在机器视觉系统中获得更好的性能。这些模型大多需要改进,因为它们基于自下而上的计算,没有考虑自上而下的图像语义内容,并且常常与实际眼睛注视位置不匹配。在本研究中,我们记录了14名观看包含自然场景(如风景、动物)和人造场景(如建筑、车辆)图像的个体的眼动(即注视)情况。我们在两种图像类别中提取了眼动的注视位置。在提取注视区域(每个注视位置周围的一个小块)后,将这些区域的特征与非注视区域进行比较评估。每个小块中提取的特征包括方向和空间频率。在特征提取阶段之后,通过这些特征训练不同的统计分类器来预测眼睛注视位置。本研究将眼动追踪结果与图像显著区域的自动预测联系起来。结果表明,利用受试者注视点周围的图像小块可以预测眼睛注视位置。