Wang Ruimin, Li Haitao, Jing Jing, Jiang Liehui, Dong Weiyu
State Key Laboratory of Mathematical Engineering and Advanced Computing, Zhengzhou 450000, China.
Key Laboratory of Cyberspace Situation Awareness of Henan Province, Zhengzhou 450000, China.
Sensors (Basel). 2022 Jun 29;22(13):4892. doi: 10.3390/s22134892.
With the improvement of intelligence and interconnection, Internet of Things (IoT) devices tend to become more vulnerable and exposed to many threats. Device identification is the foundation of many cybersecurity operations, such as asset management, vulnerability reaction, and situational awareness, which are important for enhancing the security of IoT devices. The more information sources and the more angles of view we have, the more precise identification results we obtain. This study proposes a novel and alternative method for IoT device identification, which introduces commonly available WebUI login pages with distinctive characteristics specific to vendors as the data source and uses an ensemble learning model based on a combination of Convolutional Neural Networks (CNN) and Deep Neural Networks (DNN) for device vendor identification and develops an Optical Character Recognition (OCR) based method for device type and model identification. The experimental results show that the ensemble learning model can achieve 99.1% accuracy and 99.5% F1-Score in the determination of whether a device is from a vendor that appeared in the training dataset, and if the answer is positive, 98% accuracy and 98.3% F1-Score in identifying which vendor it is from. The OCR-based method can identify fine-grained attributes of the device and achieve an accuracy of 99.46% in device model identification, which is higher than the results of the Shodan cyber search engine by a considerable margin of 11.39%.
随着智能化和互联性的提高,物联网(IoT)设备往往变得更加脆弱,容易受到多种威胁。设备识别是许多网络安全操作的基础,如资产管理、漏洞响应和态势感知,这些对于增强物联网设备的安全性至关重要。我们拥有的信息源越多、视角越多,获得的识别结果就越精确。本研究提出了一种新颖的物联网设备识别替代方法,该方法引入具有特定供应商独特特征的通用WebUI登录页面作为数据源,并使用基于卷积神经网络(CNN)和深度神经网络(DNN)组合的集成学习模型进行设备供应商识别,还开发了一种基于光学字符识别(OCR)的方法进行设备类型和型号识别。实验结果表明,在确定设备是否来自训练数据集中出现的供应商时,集成学习模型的准确率可达99.1%,F1分数可达99.5%;如果答案是肯定的,在识别设备来自哪个供应商时,准确率为98%,F1分数为98.3%。基于OCR的方法可以识别设备的细粒度属性,在设备型号识别中准确率达到99.46%,比Shodan网络搜索引擎的结果高出11.39%的可观幅度。