State Key Laboratory of Mathematical Engineering and Advanced Computing, Zhengzhou, China.
Information Engineering University of China, ZhengZhou, China.
PLoS One. 2021 Jan 12;16(1):e0245098. doi: 10.1371/journal.pone.0245098. eCollection 2021.
The rapid expansion of the open-source community has shortened the software development cycle, but the spread of vulnerabilities has been accelerated, especially in the field of the Internet of Things. In recent years, the frequency of attacks against connected devices is increasing exponentially; thus, the vulnerabilities are more serious in nature. The state-of-the-art firmware security inspection technologies, such as methods based on machine learning and graph theory, find similar applications depending on the known vulnerabilities but cannot do anything without detailed information about the vulnerabilities. Moreover, model training, which is necessary for the machine learning technologies, requires a significant amount of time and data, resulting in low efficiency and poor extensibility. Aiming at the above shortcomings, a high-efficiency similarity analysis approach for firmware code is proposed in this study. First, the function control flow features and data flow features are extracted from the functions of the firmware and of the vulnerabilities, and the features are used to calculate the SimHash of the functions. The mass storage and fast query capabilities of the SimHash are implemented by the pigeonhole principle. Second, the similarity function pairs are analyzed in detail within and among the basic blocks. Within the basic blocks, the symbolic execution is used to generate the basic block semantic information, and the constraint solver is used to determine the semantic equivalence. Among the basic blocks, the local control flow graphs are analyzed to obtain their similarity. Then, we implemented a prototype and present the evaluation. The evaluation results demonstrate that the proposed approach can implement large-scale firmware function similarity analysis. It can also get the location of the real-world firmware patch without vulnerability function information. Finally, we compare our method with existing methods. The comparison results demonstrate that our method is more efficient and accurate than the Gemini and StagedMethod. More than 90% of the firmware functions can be indexed within 0.1 s, while the search time of 100,000 firmware functions is less than 2 s.
开源社区的迅速发展缩短了软件开发周期,但漏洞的传播速度也加快了,尤其是在物联网领域。近年来,针对联网设备的攻击频率呈指数级增长,因此漏洞的性质更为严重。基于机器学习和图论的最新固件安全检测技术等方法,虽然可以根据已知漏洞找到相似的应用,但如果没有关于漏洞的详细信息,它们也无能为力。此外,机器学习技术所需的模型训练需要大量的时间和数据,导致效率低下且可扩展性差。针对上述缺点,本研究提出了一种高效的固件代码相似性分析方法。首先,从固件和漏洞的函数中提取功能控制流特征和数据流特征,并使用这些特征计算函数的 SimHash。利用鸽巢原理实现了 SimHash 的大容量存储和快速查询功能。其次,详细分析了基本块内和基本块之间的相似性函数对。在基本块内,使用符号执行生成基本块语义信息,并使用约束求解器确定语义等价性。在基本块之间,分析局部控制流图以获取它们的相似性。然后,我们实现了一个原型并进行了评估。评估结果表明,该方法可以实现大规模固件功能相似性分析,并且可以在没有漏洞功能信息的情况下找到实际固件补丁的位置。最后,我们将我们的方法与现有的方法进行了比较。比较结果表明,我们的方法比 Gemini 和 StagedMethod 更高效和准确。超过 90%的固件功能可以在 0.1 秒内索引,而 100,000 个固件功能的搜索时间不到 2 秒。