Suppr超能文献

基于粒子群优化的反向传播神经网络对源代码的作者归属分析

Authorship attribution of source code by using back propagation neural network based on particle swarm optimization.

作者信息

Yang Xinyu, Xu Guoai, Li Qi, Guo Yanhui, Zhang Miao

机构信息

National Engineering Lab for Mobile Network Technologies, Beijing University of Posts and Telecommunications, Beijing, China.

出版信息

PLoS One. 2017 Nov 2;12(11):e0187204. doi: 10.1371/journal.pone.0187204. eCollection 2017.

Abstract

Authorship attribution is to identify the most likely author of a given sample among a set of candidate known authors. It can be not only applied to discover the original author of plain text, such as novels, blogs, emails, posts etc., but also used to identify source code programmers. Authorship attribution of source code is required in diverse applications, ranging from malicious code tracking to solving authorship dispute or software plagiarism detection. This paper aims to propose a new method to identify the programmer of Java source code samples with a higher accuracy. To this end, it first introduces back propagation (BP) neural network based on particle swarm optimization (PSO) into authorship attribution of source code. It begins by computing a set of defined feature metrics, including lexical and layout metrics, structure and syntax metrics, totally 19 dimensions. Then these metrics are input to neural network for supervised learning, the weights of which are output by PSO and BP hybrid algorithm. The effectiveness of the proposed method is evaluated on a collected dataset with 3,022 Java files belong to 40 authors. Experiment results show that the proposed method achieves 91.060% accuracy. And a comparison with previous work on authorship attribution of source code for Java language illustrates that this proposed method outperforms others overall, also with an acceptable overhead.

摘要

作者身份归属是在一组已知候选作者中确定给定样本最有可能的作者。它不仅可以应用于发现诸如小说、博客、电子邮件、帖子等纯文本的原始作者,还可以用于识别源代码程序员。源代码的作者身份归属在各种应用中都有需求,从恶意代码追踪到解决作者身份争议或软件抄袭检测。本文旨在提出一种新方法,以更高的准确率识别Java源代码样本的程序员。为此,它首先将基于粒子群优化(PSO)的反向传播(BP)神经网络引入到源代码的作者身份归属中。首先计算一组定义的特征度量,包括词汇和布局度量、结构和语法度量,共19个维度。然后将这些度量输入到神经网络进行监督学习,其权重由PSO和BP混合算法输出。在所收集的包含属于40位作者的3022个Java文件的数据集上评估了所提方法的有效性。实验结果表明,所提方法的准确率达到了91.060%。与之前关于Java语言源代码作者身份归属的工作进行比较表明,所提方法总体上优于其他方法,且开销可接受。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c0bd/5667828/484933823b06/pone.0187204.g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验