Sheu Jyh-Jian, Chu Ko-Tsung, Li Nien-Feng, Lee Cheng-Chi
College of Communication National Chengchi University Taipei, Taiwan, R.O.C.
Department of Finance Minghsin University of Science and Technology Hsinchu, Taiwan, R.O.C.
PLoS One. 2017 Feb 9;12(2):e0171518. doi: 10.1371/journal.pone.0171518. eCollection 2017.
This research manages in-depth analysis on the knowledge about spams and expects to propose an efficient spam filtering method with the ability of adapting to the dynamic environment. We focus on the analysis of email's header and apply decision tree data mining technique to look for the association rules about spams. Then, we propose an efficient systematic filtering method based on these association rules. Our systematic method has the following major advantages: (1) Checking only the header sections of emails, which is different from those spam filtering methods at present that have to analyze fully the email's content. Meanwhile, the email filtering accuracy is expected to be enhanced. (2) Regarding the solution to the problem of concept drift, we propose a window-based technique to estimate for the condition of concept drift for each unknown email, which will help our filtering method in recognizing the occurrence of spam. (3) We propose an incremental learning mechanism for our filtering method to strengthen the ability of adapting to the dynamic environment.
本研究对垃圾邮件知识进行深入分析,期望提出一种能适应动态环境的高效垃圾邮件过滤方法。我们专注于电子邮件头部分析,并应用决策树数据挖掘技术来寻找垃圾邮件的关联规则。然后,基于这些关联规则提出一种高效的系统过滤方法。我们的系统方法具有以下主要优点:(1)仅检查电子邮件的头部部分,这与目前必须全面分析电子邮件内容的垃圾邮件过滤方法不同。同时,有望提高电子邮件过滤的准确性。(2)针对概念漂移问题的解决方案,我们提出一种基于窗口的技术来估计每封未知电子邮件的概念漂移情况,这将有助于我们的过滤方法识别垃圾邮件的出现。(3)我们为过滤方法提出一种增量学习机制,以增强其适应动态环境的能力。