Sheikh Burhan Ul Haque, Zafar Aasim
Department of Computer Science, Aligarh Muslim University, Aligarh, 202002 Uttar Pradesh India.
SN Comput Sci. 2023;4(3):288. doi: 10.1007/s42979-023-01738-9. Epub 2023 Mar 27.
The primary mode of COVID-19 transmission is through respiratory droplets that are produced when an infected person talks, coughs, or sneezes. To avoid the fast spread of the virus, the WHO has instructed people to use face masks in crowded and public areas. This paper proposes the rapid real-time face mask detection system or RRFMDS, an automated computer-aided system to detect a violation of a face mask in real-time video. In the proposed system, single-shot multi-box detector is utilized for face detection, while fine-tuned MobileNetV2 is used for face mask classification. The system is lightweight (low resource requirement) and can be merged with pre-installed CCTV cameras to detect face mask violation. The system is trained on a custom dataset which consists of 14,535 images, of which 5000 belong to incorrect masks, 4789 to with masks, and 4746 to without masks. The primary purpose of creating such a dataset was to develop a face mask detection system that can detect almost all types of face masks with different orientations. The system can detect all three classes (incorrect masks, with mask and without mask faces) with an average accuracy of 99.15% and 97.81%, respectively, on training and testing data. The system, on average, takes 0.14201142 s to process a single frame, including detecting the faces from the video, processing a frame and classification.
新型冠状病毒肺炎(COVID-19)的主要传播方式是通过感染者说话、咳嗽或打喷嚏时产生的呼吸道飞沫。为避免病毒快速传播,世界卫生组织已指示人们在拥挤和公共场所佩戴口罩。本文提出了快速实时口罩检测系统(RRFMDS),这是一种自动化的计算机辅助系统,用于在实时视频中检测违反佩戴口罩规定的行为。在所提出的系统中,采用单阶段多框检测器进行人脸检测,同时使用微调后的MobileNetV2进行口罩分类。该系统轻量级(资源需求低),可与预装的闭路电视摄像机集成,以检测违反佩戴口罩规定的行为。该系统在一个自定义数据集上进行训练,该数据集由14535张图像组成,其中5000张属于不正确佩戴口罩的图像,4789张属于佩戴口罩的图像,4746张属于未佩戴口罩的图像。创建这样一个数据集的主要目的是开发一种口罩检测系统,该系统能够检测几乎所有不同方向的口罩类型。该系统在训练数据和测试数据上分别能够检测所有三类(不正确佩戴口罩、佩戴口罩和未佩戴口罩的人脸),平均准确率分别为99.15%和97.81%。该系统平均处理一帧需要0.14201142秒,包括从视频中检测人脸、处理一帧以及进行分类。