Suppr超能文献

基于图变分自编码器的恶意软件检测框架从应用程序编程接口调用图中提取嵌入。

Malware detection framework based on graph variational autoencoder extracted embeddings from API-call graphs.

作者信息

Gunduz Hakan

机构信息

Software Engineering Department, Kocaeli University, Kocaeli, Marmara, Turkey.

出版信息

PeerJ Comput Sci. 2022 May 18;8:e988. doi: 10.7717/peerj-cs.988. eCollection 2022.

Abstract

Malware harms the confidentiality and integrity of the information that causes material and moral damages to institutions or individuals. This study proposed a malware detection model based on API-call graphs and used Graph Variational Autoencoder (GVAE) to reduce the size of graph node features extracted from Android apk files. GVAE-reduced embeddings were fed to linear-based (SVM) and ensemble-based (LightGBM) models to finalize the malware detection process. To validate the effectiveness of the GVAE-reduced features, recursive feature elimination (RFE) and Fisher score (FS) were applied to select informative feature sets with the same sizes as GVAE-reduced embeddings. The results with RFE and FS selections revealed that LightGBM and RFE-selected 50 features achieved the highest accuracy (0.907) and F-measure (0.852) rates. When we used GVAE-reduced embeddings in the classification, there was an approximate increase of %4 in both models' accuracy rates. The same performance increase occurred in F-measure rates which directly indicated the improvement in the discrimination powers of the models. The last conducted experiment that combined the strengths of RFE selection and GVAE led to a performance increase compared to only GVAE-reduced embeddings. RFE selection achieved an accuracy rate of 0.967 in LightGBM with the help of selected 30 relevant features from the combination of all GVAE-embeddings.

摘要

恶意软件会损害信息的保密性和完整性,给机构或个人造成物质和精神损害。本研究提出了一种基于API调用图的恶意软件检测模型,并使用图变分自编码器(GVAE)来减小从安卓应用程序包(apk)文件中提取的图节点特征的大小。将经GVAE压缩的嵌入向量输入基于线性的(支持向量机)模型和基于集成的(LightGBM)模型,以完成恶意软件检测过程。为了验证经GVAE压缩的特征的有效性,应用递归特征消除(RFE)和Fisher分数(FS)来选择与经GVAE压缩的嵌入向量大小相同的信息性特征集。RFE和FS选择的结果表明,LightGBM和RFE选择的50个特征实现了最高的准确率(0.907)和F值(0.852)率。当我们在分类中使用经GVAE压缩的嵌入向量时,两个模型的准确率都有大约4%的提高。F值率也有同样的性能提升,这直接表明了模型辨别能力的提高。最后进行的结合RFE选择和GVAE优势的实验相比于仅使用经GVAE压缩的嵌入向量,性能有所提高。借助从所有GVAE嵌入向量的组合中选择的30个相关特征,RFE选择在LightGBM中实现了0.967的准确率。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/74cc/9137949/0ed3228f8d65/peerj-cs-08-988-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验