Effective Source Code Vectorization for Vulnerability Detection Using Deep Learning and Attention Mechanism
29 Pages Posted: 29 Jan 2023
Abstract
With the increasing availability of computational power, deep learning methods have been widely used for detecting software vulnerabilities in recent years. In contrast to traditional machine learning technology, deep learning methods have the merits of low computational overhead and high vulnerability detection accuracy, and they do not depend on expert knowledge to extract vulnerability features. However, the performance of many existing deep vulnerability detection methods is degraded by inadequate information about the syntax and semantics of source code. This paper proposes Vulnerability Detection based on Deep learning and Attention mechanisms (VDDA), an effective software vulnerability detection model based on deep learning and an attention mechanism. In VDDA, the bidirectional long short-term memory (BLSTM) deep model is used to alleviate the need for the feature engineering of traditional machine learning techniques. With the Joern analysis tool, the source code is converted to code property graphs (CPG) to retain the affluent syntax and semantic information. Several improvements, including depth-first traversal-based CPG optimization, three-direction code slicing, slice organization with code blocks, and the separation of function names from variable names in code symbolization, were made to effectively convert source code into vectors that could be taken as the only input to the underlying deep learning model. Meanwhile, because different parts of the vector play different roles in vulnerability detection, the attention mechanism was integrated with BLSTM to further improve vulnerability detection performance. The experimental results on two datasets of different scales demonstrated that the proposed VDDA outperforms many existing methods in vulnerability detection.
Keywords: Software security, Vulnerability detection, Deep Learning, Attention mechanism
Suggested Citation: Suggested Citation