Lu Shida, Huang Kai, Meraj Talha, Rauf Hafiz Tayyab
State Grid Information & Communication Company, SMEPC, Shanghai, China.
Shanghai Shineenergy Information Technology Development Co., Ltd., Shanghai, China.
PeerJ Comput Sci. 2022 Apr 6;8:e879. doi: 10.7717/peerj-cs.879. eCollection 2022.
A Completely Automated Public Turing Test to tell Computers and Humans Apart (CAPTCHA) is used in web systems to secure authentication purposes; it may break using Optical Character Recognition (OCR) type methods. CAPTCHA breakers make web systems highly insecure. However, several techniques to break CAPTCHA suggest CAPTCHA designers about their designed CAPTCHA's need improvement to prevent computer vision-based malicious attacks. This research primarily used deep learning methods to break state-of-the-art CAPTCHA codes; however, the validation scheme and conventional Convolutional Neural Network (CNN) design still need more confident validation and multi-aspect covering feature schemes. Several public datasets are available of text-based CAPTCHa, including Kaggle and other dataset repositories where self-generation of CAPTCHA datasets are available. The previous studies are dataset-specific only and cannot perform well on other CAPTCHA's. Therefore, the proposed study uses two publicly available datasets of 4- and 5-character text-based CAPTCHA images to propose a CAPTCHA solver. Furthermore, the proposed study used a skip-connection-based CNN model to solve a CAPTCHA. The proposed research employed 5-folds on data that delivers 10 different CNN models on two datasets with promising results compared to the other studies.
一种用于区分计算机和人类的完全自动化的公开图灵测试(验证码)被用于网络系统以实现安全认证目的;它可能会被使用光学字符识别(OCR)类型的方法破解。验证码破解器会使网络系统变得极不安全。然而,几种破解验证码的技术向验证码设计者表明,他们设计的验证码需要改进,以防止基于计算机视觉的恶意攻击。本研究主要使用深度学习方法来破解最先进的验证码代码;然而,验证方案和传统的卷积神经网络(CNN)设计仍需要更可靠的验证和多方面覆盖的特征方案。有几个基于文本的验证码的公共数据集可供使用,包括Kaggle和其他可自行生成验证码数据集的数据集存储库。先前的研究仅针对特定数据集,在其他验证码上表现不佳。因此,本研究使用两个基于4字符和5字符文本的验证码图像的公共数据集来提出一种验证码求解器。此外,本研究使用基于跳跃连接的CNN模型来求解验证码。本研究对数据采用5折交叉验证,在两个数据集上产生了10个不同的CNN模型,与其他研究相比取得了有前景的结果。