Nousias George, Delibasis Konstantinos K, Labiris Georgios
Department of Computer Science and Biomedical Informatics, University of Thessaly, 35131 Lamia, Greece.
Department of Ophthalmology, General University Hospital of Alexandroupolis, 68131 Alexandroupolis, Greece.
J Imaging. 2025 Jan 19;11(1):27. doi: 10.3390/jimaging11010027.
Blink detection is considered a useful indicator both for clinical conditions and drowsiness state. In this work, we propose and compare deep learning architectures for the task of detecting blinks in video frame sequences. The first step is the training and application of an eye detector that extracts the eye regions from each video frame. The cropped eye regions are organized as three-dimensional (3D) input with the third dimension spanning time of 300 ms. Two different 3D convolutional neural networks are utilized (a simple 3D CNN and 3D ResNet), as well as a 3D autoencoder combined with a classifier coupled to the latent space. Finally, we propose the usage of a frame prediction accumulator combined with morphological processing and watershed segmentation to detect blinks and determine their start and stop frame in previously unseen videos. The proposed framework was trained on ten (9) different participants and tested on five (8) different ones, with a total of 162,400 frames and 1172 blinks for each eye. The start and end frame of each blink in the dataset has been annotate by specialized ophthalmologist. Quantitative comparison with state-of-the-art blink detection methodologies provide favorable results for the proposed neural architectures coupled with the prediction accumulator, with the 3D ResNet being the best as well as the fastest performer.
眨眼检测被认为是临床状况和嗜睡状态的一个有用指标。在这项工作中,我们提出并比较用于检测视频帧序列中眨眼任务的深度学习架构。第一步是训练和应用一个从每个视频帧中提取眼睛区域的眼睛检测器。裁剪后的眼睛区域被组织成三维(3D)输入,第三维跨越300毫秒的时间。使用了两种不同的3D卷积神经网络(一个简单的3D CNN和3D ResNet),以及一个与耦合到潜在空间的分类器相结合的3D自动编码器。最后,我们提出使用帧预测累加器结合形态学处理和分水岭分割来检测眨眼,并在以前未见过的视频中确定它们的起始和停止帧。所提出的框架在十(九)个不同参与者上进行了训练,并在五(八)个不同参与者上进行了测试,每只眼睛共有162,400帧和1172次眨眼。数据集中每个眨眼的起始和结束帧已由专业眼科医生进行了标注。与现有最先进的眨眼检测方法进行的定量比较为所提出的结合预测累加器的神经架构提供了良好的结果,其中3D ResNet是性能最佳且速度最快的。