AESPA: Asynchronous Execution Scheme to Exploit Bank-Level Parallelism of Processing-in-Memory

Hongju Kal, Chanyoung Yoo, Won Woo Ro

Research output: Chapter in Book/Report/Conference proceedingConference contribution

3 Citations (Scopus)

Abstract

This paper presents an asynchronous execution scheme to leverage the bank-level parallelism of near-bank processing-in-memory (PIM). We observe that performing memory operations underutilizes the parallelism of PIM computation because near-bank PIMs are designated to operate all banks synchronously. The all-bank computation can be delayed when one of the banks performs the basic memory commands, such as read/write requests and activation/precharge operations. We aim to mitigate the throughput degradation and especially focus on execution delay caused by activation/precharge operations. For all-bank execution accessing the same row of all banks, a large number of activation/precharge operations inevitably occur. Considering the timing parameter limiting the rate of row-open operations (tFAW), the throughput might decrease even further. To resolve this activation/precharge overhead, we propose AESPA, a new parallel execution scheme that operates banks asynchronously. AESPA is different from the previous synchronous execution in that (1) the compute command of AESPA targets a single bank, and (2) each processing unit computes data stored in multiple DRAM columns. By doing so, while one bank computes multiple DRAM columns, the memory controller issues activation/precharge or PIM compute commands to other banks. Thus, AESPA hides the activation latency of PIM computation and fully utilizes the aggregated bandwidth of the banks. For this, we modify hardware and software to support vector and matrix computation of previous near-bank PIM architectures. In particular, we change the matrix-vector multiplication based on an inner product to fit it on AESPA PIM. Previous matrix-vector multiplication requires data broadcasting and simultaneous computation across all processing units. By changing the matrix-vector multiplication method, AESPA PIM can transfer data to respective processing units and start computation asynchronously. As a result, the near-bank PIMs adopting AESPA achieve 33.5% and 59.5% speedup compared to two different state-of-the-art PIMs.

Original languageEnglish
Title of host publicationProceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2023
PublisherAssociation for Computing Machinery, Inc
Pages815-827
Number of pages13
ISBN (Electronic)9798400703294
DOIs
Publication statusPublished - 2023 Oct 28
Event56th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2023 - Toronto, Canada
Duration: 2023 Oct 282023 Nov 1

Publication series

NameProceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2023

Conference

Conference56th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2023
Country/TerritoryCanada
CityToronto
Period23/10/2823/11/1

Bibliographical note

Publisher Copyright:
© 2023 ACM.

All Science Journal Classification (ASJC) codes

  • Computer Networks and Communications
  • Hardware and Architecture
  • Renewable Energy, Sustainability and the Environment

Fingerprint

Dive into the research topics of 'AESPA: Asynchronous Execution Scheme to Exploit Bank-Level Parallelism of Processing-in-Memory'. Together they form a unique fingerprint.

Cite this