Buchtala Oliver, Klimek Manuel, Sick Bernhard
Faculty for Computer Science and Mathematics, University of Passau, Germany.
IEEE Trans Syst Man Cybern B Cybern. 2005 Oct;35(5):928-47. doi: 10.1109/tsmcb.2005.847743.
In many data mining applications that address classification problems, feature and model selection are considered as key tasks. That is, appropriate input features of the classifier must be selected from a given (and often large) set of possible features and structure parameters of the classifier must be adapted with respect to these features and a given data set. This paper describes an evolutionary algorithm (EA) that performs feature and model selection simultaneously for radial basis function (RBF) classifiers. In order to reduce the optimization effort, various techniques are integrated that accelerate and improve the EA significantly: hybrid training of RBF networks, lazy evaluation, consideration of soft constraints by means of penalty terms, and temperature-based adaptive control of the EA. The feasibility and the benefits of the approach are demonstrated by means of four data mining problems: intrusion detection in computer networks, biometric signature verification, customer acquisition with direct marketing methods, and optimization of chemical production processes. It is shown that, compared to earlier EA-based RBF optimization techniques, the runtime is reduced by up to 99% while error rates are lowered by up to 86%, depending on the application. The algorithm is independent of specific applications so that many ideas and solutions can be transferred to other classifier paradigms.
在许多解决分类问题的数据挖掘应用中,特征和模型选择被视为关键任务。也就是说,必须从给定的(通常很大)一组可能特征中选择分类器合适的输入特征,并且分类器的结构参数必须根据这些特征和给定数据集进行调整。本文描述了一种进化算法(EA),它能同时为径向基函数(RBF)分类器执行特征和模型选择。为了减少优化工作量,集成了各种能显著加速和改进EA的技术:RBF网络的混合训练、惰性评估、通过惩罚项考虑软约束以及基于温度的EA自适应控制。通过四个数据挖掘问题展示了该方法的可行性和优势:计算机网络中的入侵检测、生物特征签名验证、采用直销方法获取客户以及化工生产过程的优化。结果表明,与早期基于EA的RBF优化技术相比,运行时间最多可减少99%,而错误率最多可降低86%,具体取决于应用。该算法独立于特定应用,因此许多思路和解决方案可转移到其他分类器范式中。