TY - JOUR
T1 - Comprehensive Design Space Exploration for Graph Neural Network Aggregation on GPUs
AU - Nam, Hyunwoo
AU - Lee, Jay Hwan
AU - Yang, Shinhyung
AU - Kim, Yeonsoo
AU - Jeong, Jiun
AU - Kim, Jeonggeun
AU - Burgstaller, Bernd
N1 - Publisher Copyright:
© 2025 IEEE.
PY - 2025
Y1 - 2025
N2 - Graph neural networks (GNNs) have become the state-of-the-art technology for extracting and predicting data representations on graphs. With increasing demand to accelerate GNN computations, the GPU has become the dominant platform for GNN training and inference. GNNs consist of a compute-bound combination phase and a memory-bound aggregation phase. The memory access patterns of the aggregation phase remain a major performance bottleneck on GPUs, despite recent microarchitectural enhancements. Although GNN characterizations have been conducted to investigate this bottleneck, they did not reveal the impact of architectural modifications. However, a comprehensive understanding of improvements from such modifications is imperative to devise GPU optimizations for the aggregation phase. In this letter, we explore the GPU design space for aggregation by assessing the performance improvement potential of a series of architectural modifications. We find that the low locality of aggregation deteriorates performance with increased thread-level parallelism, and a significant enhancement follows memory access optimizations, which remain effective even with software optimization. Our analysis provides insights for hardware optimizations to significantly improve GNN aggregation on GPUs.
AB - Graph neural networks (GNNs) have become the state-of-the-art technology for extracting and predicting data representations on graphs. With increasing demand to accelerate GNN computations, the GPU has become the dominant platform for GNN training and inference. GNNs consist of a compute-bound combination phase and a memory-bound aggregation phase. The memory access patterns of the aggregation phase remain a major performance bottleneck on GPUs, despite recent microarchitectural enhancements. Although GNN characterizations have been conducted to investigate this bottleneck, they did not reveal the impact of architectural modifications. However, a comprehensive understanding of improvements from such modifications is imperative to devise GPU optimizations for the aggregation phase. In this letter, we explore the GPU design space for aggregation by assessing the performance improvement potential of a series of architectural modifications. We find that the low locality of aggregation deteriorates performance with increased thread-level parallelism, and a significant enhancement follows memory access optimizations, which remain effective even with software optimization. Our analysis provides insights for hardware optimizations to significantly improve GNN aggregation on GPUs.
KW - Graph neural networks
KW - graphics processing units
KW - sensitivity analysis
UR - http://www.scopus.com/inward/record.url?scp=85217559173&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85217559173&partnerID=8YFLogxK
U2 - 10.1109/LCA.2025.3539371
DO - 10.1109/LCA.2025.3539371
M3 - Article
AN - SCOPUS:85217559173
SN - 1556-6056
VL - 24
SP - 45
EP - 48
JO - IEEE Computer Architecture Letters
JF - IEEE Computer Architecture Letters
IS - 1
ER -