Cyber Security Research and Innovation Centre, Faculty of Science, Engineering and Built Environment, Deakin University, Geelong, VIC 3220, Australia.
Department of Computer Science, Northern Border University, 9280 Arar, Saudi Arabia.
Sensors (Basel). 2021 Feb 16;21(4):1374. doi: 10.3390/s21041374.
Malicious software ("malware") has become one of the serious cybersecurity issues in Android ecosystem. Given the fast evolution of Android malware releases, it is practically not feasible to manually detect malware apps in the Android ecosystem. As a result, machine learning has become a fledgling approach for malware detection. Since machine learning performance is largely influenced by the availability of high quality and relevant features, feature selection approaches play key role in machine learning based detection of malware. In this paper, we formulate the feature selection problem as a quadratic programming problem and analyse how commonly used filter-based feature selection methods work with emphases on Android malware detection. We compare and contrast several feature selection methods along several factors including the composition of relevant features selected. We empirically evaluate the predictive accuracy of the feature subset selection algorithms and compare their predictive accuracy and the execution time using several learning algorithms. The results of the experiments confirm that feature selection is necessary for improving accuracy of the learning models as well decreasing the run time. The results also show that the performance of the feature selection algorithms vary from one learning algorithm to another and no one feature selection approach performs better than the other approaches all the time.
恶意软件(“malware”)已成为 Android 生态系统中严重的网络安全问题之一。鉴于 Android 恶意软件的快速发布,在 Android 生态系统中手动检测恶意软件应用程序实际上是不可行的。因此,机器学习已成为恶意软件检测的新兴方法。由于机器学习的性能在很大程度上受到高质量和相关特征的可用性的影响,因此特征选择方法在基于机器学习的恶意软件检测中起着关键作用。在本文中,我们将特征选择问题表述为二次规划问题,并分析常用的基于过滤器的特征选择方法在 Android 恶意软件检测中的工作原理。我们比较和对比了几种特征选择方法,包括所选择的相关特征的组成,沿几个因素包括选择相关特征的组成。我们通过几种学习算法对特征子集选择算法的预测精度进行了实证评估,并比较了它们的预测精度和执行时间。实验结果证实,特征选择对于提高学习模型的准确性以及减少运行时间是必要的。结果还表明,特征选择算法的性能因学习算法的不同而有所不同,没有一种特征选择方法始终优于其他方法。