Bissarinova Ulzhan, Rakhimzhanova Tomiris, Kenzhebalin Daulet, Varol Huseyin Atakan
Institute of Smart Systems and Artificial Intelligence, Nazarbayev University, Astana 010000, Kazakhstan.
Sensors (Basel). 2024 Feb 22;24(5):1409. doi: 10.3390/s24051409.
The use of event-based cameras in computer vision is a growing research direction. However, despite the existing research on face detection using the event camera, a substantial gap persists in the availability of a large dataset featuring annotations for faces and facial landmarks on event streams, thus hampering the development of applications in this direction. In this work, we address this issue by publishing the first large and varied dataset (Faces in Event Streams) with a duration of 689 min for face and facial landmark detection in direct event-based camera outputs. In addition, this article presents 12 models trained on our dataset to predict bounding box and facial landmark coordinates with an mAP score of more than 90%. We also performed a demonstration of real-time detection with an event-based camera using our models.
基于事件的相机在计算机视觉中的应用是一个不断发展的研究方向。然而,尽管已有关于使用事件相机进行面部检测的研究,但在事件流上具有面部和面部地标注释的大型数据集的可用性方面仍存在很大差距,从而阻碍了该方向应用的发展。在这项工作中,我们通过发布第一个大型多样的数据集(事件流中的面部)来解决这个问题,该数据集时长689分钟,用于直接基于事件的相机输出中的面部和面部地标检测。此外,本文展示了在我们的数据集上训练的12个模型,这些模型用于预测边界框和面部地标坐标,平均精度均值(mAP)得分超过90%。我们还使用我们的模型进行了基于事件相机的实时检测演示。