Zhang Lian-hua, Zhang Guan-hua, Zhang Jie, Bai Ying-cai
Department of Computer Science and Engineering, Shanghai Jiaotong University, Shanghai 200030, China.
J Zhejiang Univ Sci. 2004 Sep;5(9):1076-86. doi: 10.1631/jzus.2004.1076.
Recently machine learning-based intrusion detection approaches have been subjected to extensive researches because they can detect both misuse and anomaly. In this paper, rough set classification (RSC), a modern learning algorithm, is used to rank the features extracted for detecting intrusions and generate intrusion detection models. Feature ranking is a very critical step when building the model. RSC performs feature ranking before generating rules, and converts the feature ranking to minimal hitting set problem addressed by using genetic algorithm (GA). This is done in classical approaches using Support Vector Machine (SVM) by executing many iterations, each of which removes one useless feature. Compared with those methods, our method can avoid many iterations. In addition, a hybrid genetic algorithm is proposed to increase the convergence speed and decrease the training time of RSC. The models generated by RSC take the form of "IF-THEN" rules, which have the advantage of explication. Tests and comparison of RSC with SVM on DARPA benchmark data showed that for Probe and DoS attacks both RSC and SVM yielded highly accurate results (greater than 99% accuracy on testing set).
近年来,基于机器学习的入侵检测方法受到了广泛研究,因为它们既能检测误用情况,又能检测异常情况。在本文中,粗糙集分类(RSC)这一现代学习算法被用于对提取的用于检测入侵的特征进行排序,并生成入侵检测模型。特征排序是构建模型时非常关键的一步。RSC在生成规则之前执行特征排序,并将特征排序转换为使用遗传算法(GA)解决的最小击中集问题。在使用支持向量机(SVM)的经典方法中,这是通过执行多次迭代来完成的,每次迭代去除一个无用特征。与那些方法相比,我们的方法可以避免多次迭代。此外,还提出了一种混合遗传算法来提高RSC的收敛速度并减少其训练时间。由RSC生成的模型采用“如果-那么”规则的形式,具有可解释的优点。在DARPA基准数据上对RSC和SVM进行的测试与比较表明,对于探测和拒绝服务攻击,RSC和SVM都产生了高度准确的结果(测试集上的准确率大于99%)。