Program retrieval remains a cornerstone of software development, crucial for boosting productivity throughout the development lifecycle. Amidst diverse program retrieval models, many have ignored the disparities between natural language queries and code, resulting in a prominent semantic gap. Moreover, programs and queries carry rich structural and semantic information. Yet, prevailing approaches often overlook the cohesion among different aspects of source code and treat queries as sequences, neglecting their inherent structural characteristics.
Credit: Qianwen GOU, Yunwei DONG, YuJiao WU, Qiao KE
Program retrieval remains a cornerstone of software development, crucial for boosting productivity throughout the development lifecycle. Amidst diverse program retrieval models, many have ignored the disparities between natural language queries and code, resulting in a prominent semantic gap. Moreover, programs and queries carry rich structural and semantic information. Yet, prevailing approaches often overlook the cohesion among different aspects of source code and treat queries as sequences, neglecting their inherent structural characteristics.
To solve the problems, a research team led by Yunwei DONG published their new research on 15 June 2024 in Frontiers of Computer Science co-published by Higher Education Press and Springer Nature.
The team proposed a framework that formulates program retrieval as a multi-relational graph similarity problem. Furthermore, a dual-level attention is applied to assign weights to nodes in multi-relational graphs by intra-relation and inter-relation level attention.
To begin, the multi-relational graph construction module focuses on representing programs and queries using code property graphs (CPG) and abstract meaning representations (AMR). This strategic approach facilitates a more comprehensive and nuanced portrayal of program and query semantics. Then the dual-level attention graph neural network is leveraged to learn semantic information for AMR and CPG. Finally, Semantic similarity calculation module is designed to calculate the similarity of query-program pairs. Compared with the existing research results, the proposed method performs relatively well among all baselines.
Future research endeavors could concentrate on optimizing multi-relational graphs by minimizing extraneous information, thereby diminishing graph complexity. Additionally, a promising avenue lies in the deliberate integration of external knowledge, such as knowledge graphs, aiming to enhance the representation of program semantics.
DOI: 10.1007/s11704-023-2678-8
Journal
Frontiers of Computer Science
DOI
10.1007/s11704-023-2678-8
Method of Research
Experimental study
Subject of Research
Not applicable
Article Title
Semantic similarity-based program retrieval: a multi-relational graph perspective
Article Publication Date
15-Apr-2024