Koorathota Sharath, Thakoor Kaveri, Hong Linbi, Mao Yaoli, Adelman Patrick, Sajda Paul
Department of Biomedical Engineering, Columbia University, New York, NY, United States.
Fovea Inc., New York, NY, United States.
Front Psychol. 2021 Feb 1;12:604522. doi: 10.3389/fpsyg.2021.604522. eCollection 2021.
There is increasing interest in how the pupil dynamics of the eye reflect underlying cognitive processes and brain states. Problematic, however, is that pupil changes can be due to non-cognitive factors, for example luminance changes in the environment, accommodation and movement. In this paper we consider how by modeling the response of the pupil in real-world environments we can capture the non-cognitive related changes and remove these to extract a residual signal which is a better index of cognition and performance. Specifically, we utilize sequence measures such as fixation position, duration, saccades, and blink-related information as inputs to a deep recurrent neural network (RNN) model for predicting subsequent pupil diameter. We build and evaluate the model for a task where subjects are watching educational videos and subsequently asked questions based on the content. Compared to commonly-used models for this task, the RNN had the lowest errors rates in predicting subsequent pupil dilation given sequence data. Most importantly was how the model output related to subjects' cognitive performance as assessed by a post-viewing test. Consistent with our hypothesis that the model captures non-cognitive pupil dynamics, we found (1) the model's root-mean square error was less for lower performing subjects than for those having better performance on the post-viewing test, (2) the residuals of the RNN (LSTM) model had the highest correlation with subject post-viewing test scores and (3) the residuals had the highest discriminability (assessed via area under the ROC curve, AUC) for classifying high and low test performers, compared to the true pupil size or the RNN model predictions. This suggests that deep learning sequence models may be good for separating components of pupil responses that are linked to luminance and accommodation from those that are linked to cognition and arousal.
眼睛的瞳孔动态如何反映潜在的认知过程和大脑状态,这一问题正引发越来越多的关注。然而,问题在于瞳孔变化可能是由非认知因素引起的,例如环境中的亮度变化、调节和运动。在本文中,我们探讨了如何通过对现实世界环境中瞳孔的反应进行建模,来捕捉与非认知相关的变化,并去除这些变化以提取一个残余信号,该信号是认知和表现的更好指标。具体而言,我们利用诸如注视位置、持续时间、扫视以及与眨眼相关的信息等序列测量作为深度循环神经网络(RNN)模型的输入,以预测后续的瞳孔直径。我们针对一项任务构建并评估了该模型,在这项任务中,受试者观看教育视频,随后根据视频内容被提问。与该任务常用的模型相比,在给定序列数据的情况下,RNN在预测后续瞳孔扩张方面的错误率最低。最重要的是模型输出与观看后测试评估的受试者认知表现之间的关系。与我们的假设一致,即该模型捕捉了非认知性瞳孔动态,我们发现:(1)对于在观看后测试中表现较差的受试者,模型的均方根误差小于表现较好的受试者;(2)RNN(长短期记忆网络,LSTM)模型的残差与受试者观看后测试成绩的相关性最高;(3)与真实瞳孔大小或RNN模型预测相比,残差在区分高测试成绩者和低测试成绩者方面具有最高的可辨别性(通过ROC曲线下面积,AUC评估)。这表明深度学习序列模型可能有助于将与亮度和调节相关的瞳孔反应成分与与认知和唤醒相关的成分区分开来。