Abstract
Sparse general matrix-matrix multiplication (SpGEMM) is a major kernel in various emerging applications, such as database management systems, deep learning, graph analysis, and recommendation systems. Since SpGEMM requires extensive computation, many SpGEMM techniques have been implemented based on graphics processing units (GPUs) to exploit massive data parallelism completely. However, traditional SpGEMM techniques usually do not fully utilize the GPU because most non-zero elements of the target sparse matrices exist in a few hub nodes, and non-hub nodes barely have non-zero elements. The data-related characteristics (power law) result in a significant degradation in performance because of the load imbalance between the GPU cores and the low utilization of each core. Many attempts have been made through recent implementations to solve this problem using smart pre-/post-processing. However, the net performance hardly improves and sometimes even deteriorates owing to the large overheads. Additionally, non-hub nodes are inherently not suitable for GPU computing, even after optimization. Furthermore, the performance is no longer dominated by kernel execution, but by data transfers such as device-to-host (D2H) data transfers and file I/Os, owing to the rapid growth in the computing power of GPUs and input data size.Therefore, this work proposes a Dynamic Block Distributor (DBD), a novel full-system-level SpGEMM orchestration framework for heterogeneous systems, improving the overall performance by enabling an efficient CPU-GPU collaboration and further minimizing the overhead in data transfer between all the system elements. This framework first divides the target matrix into smaller blocks and then offloads the computation of each block to an appropriate computing unit between a GPU and CPU based on its workload type and the status of resource utilization at runtime. It also minimizes the overhead in data transfer with simple but suitable techniques, such as Row Collecting, I/O Overlapping, and I/O Binding. Our experiments showed that this framework increased the execution latency of SpGEMM, which included both the kernel execution and D2H transfers, by 3.24x on average, and the overall execution time by 2.07x on average, compared to that of the baseline cuSPARSE library.
Original language | English |
---|---|
Title of host publication | Proceedings - 2023 IEEE 39th International Conference on Data Engineering, ICDE 2023 |
Publisher | IEEE Computer Society |
Pages | 2456-2459 |
Number of pages | 4 |
ISBN (Electronic) | 9798350322279 |
DOIs | |
Publication status | Published - 2023 |
Event | 39th IEEE International Conference on Data Engineering, ICDE 2023 - Anaheim, United States Duration: 2023 Apr 3 → 2023 Apr 7 |
Publication series
Name | Proceedings - International Conference on Data Engineering |
---|---|
Volume | 2023-April |
ISSN (Print) | 1084-4627 |
Conference
Conference | 39th IEEE International Conference on Data Engineering, ICDE 2023 |
---|---|
Country/Territory | United States |
City | Anaheim |
Period | 23/4/3 → 23/4/7 |
Bibliographical note
Publisher Copyright:© 2023 IEEE.
All Science Journal Classification (ASJC) codes
- Software
- Signal Processing
- Information Systems