Multi-Gpu Work Sharing in a Task-Based Dataflow Programming Model
12 Pages Posted: 4 Apr 2023
Abstract
Today multi-GPU computing nodes are the mainstay of most high-performance computing systems. Despite significant progress in programmability, building an application that efficiently utilizes all the GPUs in a computing node is still a significant challenge, especially using the existing shared memory and message-passing paradigms. In this aspect, the task-based dataflow programming model has emerged as an alternative for multi-GPU computing nodes. Most task-based dataflow runtimes have dynamic task mapping where tasks are mapped to different GPUs based on the current load, but once the mapping has been done, there is no re-balancing of tasks even if an imbalance is detected. In this paper, we examine how automatic dynamic work sharing between GPUs within a compute node can improve the performance of an application through better workload distribution. We demonstrate the performance improvement through dynamic work sharing using a Block-Sparse GEneral Matrix Multiplication (BSpGEMM) benchmark. While we demonstrate this in PaRSEC, a task-based dataflow runtime, the ideas discussed here are transferable to any task-based dataflow runtime.
Keywords: Tasks, Runtime, Work Sharing, PaRSEC, GPU
Suggested Citation: Suggested Citation