Comu: Contextual and Multi-Grained Code Representation Learning for Commit Message Generation
15 Pages Posted: 15 Jul 2023
Abstract
Commit messages, precisely describing the code changes for each commit in natural language, makes it possible for developers and succeeding reviewers to understand the code changes without digging into implementation details. However, the semantic and structural gap between code and natural language poses a significant challenge for commit message generation. Several researchers have proposed automated techniques to generate commit messages. Nevertheless, the information about the code is not sufficiently exploited. In this paper, we propose contextual and multi-grained code representation learning for Commit Message Generation(COMU). We first use the contextual information of code to construct global semantic information(i.e., Code_Diff). Then we extract the code structure from source code changes with different perspectives and combine the extracted structure with fine-grained editing operations to explicitly focus on the detailed information of the changed part(i.e., AST_Diff). In addition, we build the experimental datasets, since there is still no publicly sufficient dataset for this task. The release of this dataset would contribute to advancing research in this field. We perform an extensive experiment to evaluate the effectiveness of COMU. The experimental evaluation and human study show that our model outperforms the baseline model.
Keywords: Code ChangeCode Representation LearningCommit Message GenerationPre-training
Suggested Citation: Suggested Citation