• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

一种使用高级和抽象的编译代码表示来检测 Java 代码克隆的新代码表示方法。

A novel code representation for detecting Java code clones using high-level and abstract compiled code representations.

机构信息

Department of Computer Science, University of Peshawar, Peshawar, Pakistan.

Department of Computer Science, Aden Community College, Aden, Yemen.

出版信息

PLoS One. 2024 May 10;19(5):e0302333. doi: 10.1371/journal.pone.0302333. eCollection 2024.

DOI:10.1371/journal.pone.0302333
PMID:38728285
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11086904/
Abstract

In software development, it's common to reuse existing source code by copying and pasting, resulting in the proliferation of numerous code clones-similar or identical code fragments-that detrimentally affect software quality and maintainability. Although several techniques for code clone detection exist, many encounter challenges in effectively identifying semantic clones due to their inability to extract syntax and semantics information. Fewer techniques leverage low-level source code representations like bytecode or assembly for clone detection. This work introduces a novel code representation for identifying syntactic and semantic clones in Java source code. It integrates high-level features extracted from the Abstract Syntax Tree with low-level features derived from intermediate representations generated by static analysis tools, like the Soot framework. Leveraging this combined representation, fifteen machine-learning models are trained to effectively detect code clones. Evaluation on a large dataset demonstrates the models' efficacy in accurately identifying semantic clones. Among these classifiers, ensemble classifiers, such as the LightGBM classifier, exhibit exceptional accuracy. Linearly combining features enhances the effectiveness of the models compared to multiplication and distance combination techniques. The experimental findings indicate that the proposed method can outperform the current clone detection techniques in detecting semantic clones.

摘要

在软件开发中,通过复制和粘贴来重用现有源代码是很常见的,这导致了大量代码克隆的出现——相似或相同的代码片段——这会对软件质量和可维护性造成不利影响。尽管存在几种代码克隆检测技术,但由于无法提取语法和语义信息,许多技术在有效识别语义克隆方面都遇到了挑战。较少的技术利用字节码或汇编等低级源代码表示形式来进行克隆检测。这项工作介绍了一种新的代码表示形式,用于识别 Java 源代码中的语法和语义克隆。它将从抽象语法树中提取的高级特征与静态分析工具(如 Soot 框架)生成的中间表示中提取的低级特征结合起来。利用这种组合表示形式,训练了十五个机器学习模型来有效地检测代码克隆。在大型数据集上的评估表明,这些模型在准确识别语义克隆方面非常有效。在这些分类器中,集成分类器,如 LightGBM 分类器,表现出了出色的准确性。与乘法和距离组合技术相比,线性组合特征可以提高模型的有效性。实验结果表明,与现有的克隆检测技术相比,所提出的方法可以在检测语义克隆方面表现得更好。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3d05/11086904/dd2ae80400f8/pone.0302333.g018.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3d05/11086904/7291043e3305/pone.0302333.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3d05/11086904/6dc060cf1319/pone.0302333.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3d05/11086904/4de6930945cd/pone.0302333.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3d05/11086904/13fd98146a63/pone.0302333.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3d05/11086904/90e00e038934/pone.0302333.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3d05/11086904/1188af3e857d/pone.0302333.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3d05/11086904/9216a3c8b455/pone.0302333.g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3d05/11086904/9240ea7e89cf/pone.0302333.g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3d05/11086904/8f3e61aa4cc5/pone.0302333.g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3d05/11086904/2d3adbf29b7b/pone.0302333.g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3d05/11086904/fae3fb944001/pone.0302333.g011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3d05/11086904/01f2865e798b/pone.0302333.g012.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3d05/11086904/1dca8675c22d/pone.0302333.g013.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3d05/11086904/1cae7b4d2113/pone.0302333.g014.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3d05/11086904/03cda0c708dd/pone.0302333.g015.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3d05/11086904/ac27f143d23c/pone.0302333.g016.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3d05/11086904/c1bbd77a0db8/pone.0302333.g017.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3d05/11086904/dd2ae80400f8/pone.0302333.g018.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3d05/11086904/7291043e3305/pone.0302333.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3d05/11086904/6dc060cf1319/pone.0302333.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3d05/11086904/4de6930945cd/pone.0302333.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3d05/11086904/13fd98146a63/pone.0302333.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3d05/11086904/90e00e038934/pone.0302333.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3d05/11086904/1188af3e857d/pone.0302333.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3d05/11086904/9216a3c8b455/pone.0302333.g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3d05/11086904/9240ea7e89cf/pone.0302333.g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3d05/11086904/8f3e61aa4cc5/pone.0302333.g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3d05/11086904/2d3adbf29b7b/pone.0302333.g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3d05/11086904/fae3fb944001/pone.0302333.g011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3d05/11086904/01f2865e798b/pone.0302333.g012.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3d05/11086904/1dca8675c22d/pone.0302333.g013.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3d05/11086904/1cae7b4d2113/pone.0302333.g014.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3d05/11086904/03cda0c708dd/pone.0302333.g015.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3d05/11086904/ac27f143d23c/pone.0302333.g016.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3d05/11086904/c1bbd77a0db8/pone.0302333.g017.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3d05/11086904/dd2ae80400f8/pone.0302333.g018.jpg

相似文献

1
A novel code representation for detecting Java code clones using high-level and abstract compiled code representations.一种使用高级和抽象的编译代码表示来检测 Java 代码克隆的新代码表示方法。
PLoS One. 2024 May 10;19(5):e0302333. doi: 10.1371/journal.pone.0302333. eCollection 2024.
2
A systematic literature review on the applications of recurrent neural networks in code clone research.基于循环神经网络在代码克隆研究中的应用的系统性文献回顾。
PLoS One. 2024 Feb 2;19(2):e0296858. doi: 10.1371/journal.pone.0296858. eCollection 2024.
3
CRaDLe: Deep code retrieval based on semantic Dependency Learning.CRaDLe:基于语义依存学习的深度代码检索
Neural Netw. 2021 Sep;141:385-394. doi: 10.1016/j.neunet.2021.04.019. Epub 2021 Apr 26.
4
Vulnerability detection in Java source code using a quantum convolutional neural network with self-attentive pooling, deep sequence, and graph-based hybrid feature extraction.使用具有自注意力池化、深度序列和基于图的混合特征提取的量子卷积神经网络在Java源代码中进行漏洞检测。
Sci Rep. 2024 Mar 28;14(1):7406. doi: 10.1038/s41598-024-56871-z.
5
Semantic aware-based instruction embedding for binary code similarity detection.基于语义感知的指令嵌入的二进制代码相似性检测。
PLoS One. 2024 Jun 11;19(6):e0305299. doi: 10.1371/journal.pone.0305299. eCollection 2024.
6
Enriching query semantics for code search with reinforcement learning.用强化学习丰富代码搜索的查询语义。
Neural Netw. 2022 Jan;145:22-32. doi: 10.1016/j.neunet.2021.09.025. Epub 2021 Oct 11.
7
Python code smells detection using conventional machine learning models.使用传统机器学习模型检测Python代码异味。
PeerJ Comput Sci. 2023 May 29;9:e1370. doi: 10.7717/peerj-cs.1370. eCollection 2023.
8
Semantic framework for mapping object-oriented model to semantic web languages.面向对象模型到语义网语言的语义框架。
Front Neuroinform. 2015 Feb 25;9:3. doi: 10.3389/fninf.2015.00003. eCollection 2015.
9
Semantic and traditional feature fusion for software defect prediction using hybrid deep learning model.使用混合深度学习模型进行软件缺陷预测的语义与传统特征融合
Sci Rep. 2024 Jul 1;14(1):14771. doi: 10.1038/s41598-024-65639-4.
10
Authorship attribution of source code by using back propagation neural network based on particle swarm optimization.基于粒子群优化的反向传播神经网络对源代码的作者归属分析
PLoS One. 2017 Nov 2;12(11):e0187204. doi: 10.1371/journal.pone.0187204. eCollection 2017.

本文引用的文献

1
A systematic literature review on the applications of recurrent neural networks in code clone research.基于循环神经网络在代码克隆研究中的应用的系统性文献回顾。
PLoS One. 2024 Feb 2;19(2):e0296858. doi: 10.1371/journal.pone.0296858. eCollection 2024.
2
Random forests ensemble classifier trained with data resampling strategy to improve cardiac arrhythmia diagnosis.基于数据重采样策略训练的随机森林集成分类器,用于改善心律失常诊断。
Comput Biol Med. 2011 May;41(5):265-71. doi: 10.1016/j.compbiomed.2011.03.001. Epub 2011 Mar 17.
3
Rotation forest: A new classifier ensemble method.
旋转森林:一种新的分类器集成方法。
IEEE Trans Pattern Anal Mach Intell. 2006 Oct;28(10):1619-30. doi: 10.1109/TPAMI.2006.211.