Linear Algebra for Graph Processing on Gpus: Opportunities and Limitations
36 Pages Posted: 19 Apr 2023
Abstract
Graphics Processing Units (GPUs) show great potential in exploiting the inherent parallelism of graph applications. However, issues such as high global synchronization demand across running threads, irregular memory access patterns, and load imbalance, make graph processing a challenge on GPUs. Several pieces of related work propose various GPU-based frameworks relying on different parallel programming models to effectively implement graph algorithms in GPUs. However, each of these frameworks targets a few graph processing issues, and unfortunately, none of them can completely address all graph processing issues. Linear algebra is a powerful paradigm that can potentially address graph processing challenges by employing a sequence of primitive matrix operations due to more regular operations in matrix operations. Both industry and academia develop several linear algebraic libraries (e.g., nvGRAPH, GraphBLAST, GBTL) for implementing graph applications in GPUs. However, to the best of our knowledge, there is no prior work that comprehensively studies the opportunities and limitations of linear algebra in graph processing from a GPU architectural view. In this paper, we aim to (1) characterize the performance of linear algebraic graph processing in GPUs, (2) comprehensively study the reasons behind its performance improvement and degradation with respect to non-linear-algebraic implementations, and (3) make several key insights and solutions to mitigate the performance limiters of linear algebraic graph processing. To this end, we characterize six well-known graph algorithms using 160 real-word datasets on a real machine and a GPU simulator (i.e., Accelsim). Based on our findings, we discuss potential hardware/software research directions for performance improvement in graph analysis. As a case study, we devise two software-based optimization techniques that reduce the number of executed instructions through incorporating the algorithm semantics into the matrix operations. Our experimental results show up to 39.2× (8.9× on average) speed up using the proposed optimization techniques.
Keywords: GPUs, Graph Processing, Linear Algebra
Suggested Citation: Suggested Citation