SHREG: Mitigating register redundancy in GPUs

Seunghyun Jin, Hyunwuk Lee, Jonghyun Lee, Junsung Kim, Won Woo Ro

Research output: Contribution to journalArticlepeer-review

Abstract

Graphics Processing Units (GPUs) have become dominant accelerators for Machine Learning (ML) and High-Performance Computing (HPC) applications due to their massive parallelism capabilities, through the utilization of general matrix-to-matrix multiplication (GEMM) kernels. However, GEMM kernels often suffer from duplicated memory requests, mainly caused by matrix tiling used for handling large matrices. While GPUs have adopted programmable shared memory to mitigate this issue by preserving frequently reused data in shared memory, GEMM still introduces duplication in register files. Our observations show that the matrix tiling issues memory requests to the same shared memory address for neighboring threads, and this results in a substantial increase in the number of duplicated data in the register files. Such duplication degrades GPU performance by limiting warp-level parallelism due to the register shortage and redundant memory requests to shared memory. We find that the data duplication can be categorized into two types that occur with fixed patterns during the matrix tiling. Based on these observations, we introduce SHREG, an architecture design that enables different threads to share registers for overlapped data from shared memory, effectively reducing duplicated data within the register files. By leveraging the duplication patterns, SHREG utilizes register sharing and improves performance with minimal hardware overhead. Our evaluation shows that SHREG improves performance by 31.4% on various ML applications over the baseline GPU.

Original languageEnglish
Article number103152
JournalJournal of Systems Architecture
Volume152
DOIs
Publication statusPublished - 2024 Jul

Bibliographical note

Publisher Copyright:
© 2024

All Science Journal Classification (ASJC) codes

  • Software
  • Hardware and Architecture

Fingerprint

Dive into the research topics of 'SHREG: Mitigating register redundancy in GPUs'. Together they form a unique fingerprint.

Cite this