Bhandari Shailendra, Lencastre Pedro, Mathema Rujeena, Szorkovszky Alexander, Yazidi Anis, Lind Pedro G
Department of Computer Science, OsloMet - Oslo Metropolitan University, P.O. Box 4 St. Olavs plass, N-0130, Oslo, Norway.
OsloMet Artificial Intelligence Lab, Pilestredet 52, N-0166, Oslo, Norway.
Sci Rep. 2025 Jun 6;15(1):19929. doi: 10.1038/s41598-025-05286-5.
Accurate modeling of eye gaze dynamics is essential for advancement in human-computer interaction, neurological diagnostics, and cognitive research. Traditional generative models like Markov models often fail to capture the complex temporal dependencies and distributional nuance inherent in eye gaze trajectories data. This study introduces a Generative Adversarial Network (GAN) framework employing Long Short-Term Memory (LSTM) and Convolutional Neural Network (CNN) generators and discriminators to generate high-fidelity synthetic eye gaze velocity trajectories. We conducted a comprehensive evaluation of four GAN architectures: CNN-CNN, LSTM-CNN, CNN-LSTM, and LSTM-LSTM-trained under two conditions: using only adversarial loss ([Formula: see text]) and using a weighted combination of adversarial and spectral losses. Our findings reveal that the LSTM-CNN architecture trained with this new loss function exhibits the closest alignment to the real data distribution, effectively capturing both the distribution tails and the intricate temporal dependencies. The inclusion of spectral regularization significantly enhances the GANs' ability to replicate the spectral characteristics of eye gaze movements, leading to a more stable learning process and improved data fidelity. Comparative analysis with a Hidden Markov Model (HMM) optimized to four hidden states further highlights the advantages of the LSTM-CNN GAN. Statistical metrics show that the HMM-generated data significantly diverges from the real data in terms of mean, standard deviation, skewness, and kurtosis. In contrast, the LSTM-CNN model closely matches the real data across these statistics, affirming its capacity to model the complexity of eye gaze dynamics effectively. These results position the spectrally regularized LSTM-CNN GAN as a robust tool for generating synthetic eye gaze velocity data with high fidelity. Its ability to accurately replicate both the distributional and temporal properties of real data holds significant potential for applications in simulation environments, training systems, and the development of advanced eye-tracking technologies, ultimately contributing to more naturalistic and responsive human-computer interactions.
准确建模眼睛注视动态对于人机交互、神经诊断和认知研究的进展至关重要。像马尔可夫模型这样的传统生成模型往往无法捕捉眼睛注视轨迹数据中固有的复杂时间依赖性和分布细微差别。本研究引入了一种生成对抗网络(GAN)框架,该框架采用长短期记忆(LSTM)和卷积神经网络(CNN)生成器与判别器来生成高保真的合成眼睛注视速度轨迹。我们对四种GAN架构进行了全面评估:CNN-CNN、LSTM-CNN、CNN-LSTM和LSTM-LSTM,在两种条件下进行训练:仅使用对抗损失([公式:见文本])以及使用对抗损失和频谱损失的加权组合。我们的研究结果表明,使用这种新损失函数训练的LSTM-CNN架构与真实数据分布的对齐最紧密,有效地捕捉了分布尾部和复杂的时间依赖性。频谱正则化的纳入显著增强了GAN复制眼睛注视运动频谱特征的能力,导致更稳定的学习过程和更高的数据保真度。与优化为四个隐藏状态的隐马尔可夫模型(HMM)的比较分析进一步突出了LSTM-CNN GAN的优势。统计指标表明,HMM生成的数据在均值、标准差、偏度和峰度方面与真实数据有显著差异。相比之下,LSTM-CNN模型在这些统计数据上与真实数据紧密匹配,证实了其有效建模眼睛注视动态复杂性的能力。这些结果将频谱正则化的LSTM-CNN GAN定位为生成高保真合成眼睛注视速度数据的强大工具。它准确复制真实数据的分布和时间属性的能力在模拟环境、训练系统以及先进眼动追踪技术的开发中具有巨大潜力,最终有助于实现更自然和响应性更强的人机交互。