TY - GEN
T1 - GPUdmm
T2 - 20th IEEE International Symposium on High Performance Computer Architecture, HPCA 2014
AU - Kim, Youngsok
AU - Lee, Jaewon
AU - Jo, Jae Eon
AU - Kim, Jangwoo
PY - 2014
Y1 - 2014
N2 - GPU programmers suffer from programmer-managed GPU memory because both performance and programmability heavily depend on GPU memory allocation and CPU-GPU data transfer mechanisms. To improve performance and programmability, programmers should be able to place only the data frequently accessed by GPU on GPU memory while overlapping CPU-GPU data transfers and GPU executions as much as possible. However, current GPU architectures and programming models blindly place entire data on GPU memory, requiring a significantly large GPU memory size. Otherwise, they must trigger unnecessary CPU-GPU data transfers due to an insufficient GPU memory size. In this paper, we propose GPUdmm, a novel GPU architecture to enable high-performance and memory-oblivious GPU programming. First, GPUdmm uses GPU memory as a cache of CPU memory to provide programmers a view of the CPU memory-sized programming space. Second, GPUdmm achieves high performance by exploiting data locality and dynamically transferring data between CPU and GPU memories while effectively overlapping CPU-GPU data transfers and GPU executions. Third, GPUdmm can further reduce unnecessary CPU-GPU data transfers by exploiting simple programmer hints. Our carefully designed and validated experiments (e.g., PCIe/DMA timing) against representative benchmarks show that GPUdmm can achieve up to five times higher performance for the same GPU memory size, or reduce the GPU memory size requirement by up to 75% while maintaining the same performance.
AB - GPU programmers suffer from programmer-managed GPU memory because both performance and programmability heavily depend on GPU memory allocation and CPU-GPU data transfer mechanisms. To improve performance and programmability, programmers should be able to place only the data frequently accessed by GPU on GPU memory while overlapping CPU-GPU data transfers and GPU executions as much as possible. However, current GPU architectures and programming models blindly place entire data on GPU memory, requiring a significantly large GPU memory size. Otherwise, they must trigger unnecessary CPU-GPU data transfers due to an insufficient GPU memory size. In this paper, we propose GPUdmm, a novel GPU architecture to enable high-performance and memory-oblivious GPU programming. First, GPUdmm uses GPU memory as a cache of CPU memory to provide programmers a view of the CPU memory-sized programming space. Second, GPUdmm achieves high performance by exploiting data locality and dynamically transferring data between CPU and GPU memories while effectively overlapping CPU-GPU data transfers and GPU executions. Third, GPUdmm can further reduce unnecessary CPU-GPU data transfers by exploiting simple programmer hints. Our carefully designed and validated experiments (e.g., PCIe/DMA timing) against representative benchmarks show that GPUdmm can achieve up to five times higher performance for the same GPU memory size, or reduce the GPU memory size requirement by up to 75% while maintaining the same performance.
UR - http://www.scopus.com/inward/record.url?scp=84903949953&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84903949953&partnerID=8YFLogxK
U2 - 10.1109/HPCA.2014.6835963
DO - 10.1109/HPCA.2014.6835963
M3 - Conference contribution
AN - SCOPUS:84903949953
SN - 9781479930975
T3 - Proceedings - International Symposium on High-Performance Computer Architecture
SP - 546
EP - 557
BT - 20th IEEE International Symposium on High Performance Computer Architecture, HPCA 2014
PB - IEEE Computer Society
Y2 - 15 February 2014 through 19 February 2014
ER -