Suppr超能文献

CRaDLe:基于语义依存学习的深度代码检索

CRaDLe: Deep code retrieval based on semantic Dependency Learning.

作者信息

Gu Wenchao, Li Zongjie, Gao Cuiyun, Wang Chaozheng, Zhang Hongyu, Xu Zenglin, Lyu Michael R

机构信息

The Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong, China.

The School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, China.

出版信息

Neural Netw. 2021 Sep;141:385-394. doi: 10.1016/j.neunet.2021.04.019. Epub 2021 Apr 26.

Abstract

Code retrieval is a common practice for programmers to reuse existing code snippets in the open-source repositories. Given a user query (i.e., a natural language description), code retrieval aims at searching the most relevant ones from a set of code snippets. The main challenge of effective code retrieval lies in mitigating the semantic gap between natural language descriptions and code snippets. With the ever-increasing amount of available open-source code, recent studies resort to neural networks to learn the semantic matching relationships between the two sources. The statement-level dependency information, which highlights the dependency relations among the program statements during the execution, reflects the structural importance of one statement in the code, which is favorable for accurately capturing the code semantics but has never been explored for the code retrieval task. In this paper, we propose CRaDLe, a novel approach for Code Retrieval based on statement-level semantic Dependency Learning. Specifically, CRaDLe distills code representations through fusing both the dependency and semantic information at the statement level, and then learns a unified vector representation for each code and description pair for modeling the matching relationship. Comprehensive experiments and analysis on real-world datasets show that the proposed approach can accurately retrieve code snippets for a given query and significantly outperform the state-of-the-art approaches on the task.

摘要

代码检索是程序员在开源代码库中重用现有代码片段的常见做法。给定一个用户查询(即自然语言描述),代码检索旨在从一组代码片段中搜索最相关的代码片段。有效代码检索的主要挑战在于弥合自然语言描述与代码片段之间的语义鸿沟。随着可用开源代码数量的不断增加,最近的研究诉诸神经网络来学习这两种来源之间的语义匹配关系。语句级依赖信息突出了执行过程中程序语句之间的依赖关系,反映了代码中一条语句的结构重要性,这有利于准确捕捉代码语义,但从未在代码检索任务中被探索过。在本文中,我们提出了CRaDLe,一种基于语句级语义依赖学习的代码检索新方法。具体来说,CRaDLe通过融合语句级的依赖和语义信息来提炼代码表示,然后为每个代码和描述对学习一个统一的向量表示,以对匹配关系进行建模。在真实世界数据集上进行的综合实验和分析表明,所提出的方法能够为给定查询准确检索代码片段,并且在该任务上显著优于现有最先进的方法。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验