SPACE: Locality-aware processing in heterogeneous memory for personalized recommendations

Hongju Kal, Seokmin Lee, Gun Ko, Won Woo Ro

Research output: Chapter in Book/Report/Conference proceedingConference contribution

32 Citations (Scopus)

Abstract

Personalized recommendation systems have become a major AI application in modern data centers. The main challenges in processing personalized recommendation inferences are the large memory footprint and high bandwidth requirement of embedding layers. To overcome the capacity limit and bandwidth congestion of on-chip memory, near memory processing (NMP) can be a promising solution. Recent work on accelerating personalized recommendations proposes a DIMMbased NMP design to solve the bandwidth problem and increases memory capacity. The performance of NMP is determined by the internal bandwidth and the prior DIMM-based approach utilizes more DIMMs to achieve higher operation throughput. However, extending the number of DIMMs could eventually lead to significant power consumption due to inefficient scaling. We propose SPACE, a novel heterogeneous memory architecture, which is efficient in terms of performance and energy. SPACE exploits a compute-capable 3D-stacked DRAM with DIMMs for personalized recommendations. Prior to designing the proposed system, we give a quantitative analysis of the user/item interactions and define the two localities: gather locality and reduction locality. In gather operations, we find only a small proportion of items are highly-accessed by users, and we call this gather locality. Also, we define reduction locality as the reusability of the gathered items in reduction operations. Based on the gather locality, SPACE allocates highly-accessed embedding items to the 3D-stacked DRAM to achieve the maximum bandwidth. Subsequently, by exploiting reduction locality, we utilize the remaining space of the 3D-stacked DRAM to store and reuse repeated partial sums, thereby minimizing the required number of element-wise reduction operations. As a result, the evaluation shows that SPACE achieves 3.2× performance improvement and 56% energy saving over the previous DIMM-based NMPs leveraging 3D-stacked DRAM with a 1/8 size of DIMMs. Also, compared to the state-of-the-art DRAM cache designs with the same NMP configuration, SPACE achieves an average 32.7% of performance improvement.

Original languageEnglish
Title of host publicationProceedings - 2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture, ISCA 2021
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages679-691
Number of pages13
ISBN (Electronic)9781665433334
DOIs
Publication statusPublished - 2021 Jun
Event48th ACM/IEEE Annual International Symposium on Computer Architecture, ISCA 2021 - Virtual, Online, Spain
Duration: 2021 Jun 142021 Jun 19

Publication series

NameProceedings - International Symposium on Computer Architecture
Volume2021-June
ISSN (Print)1063-6897

Conference

Conference48th ACM/IEEE Annual International Symposium on Computer Architecture, ISCA 2021
Country/TerritorySpain
CityVirtual, Online
Period21/6/1421/6/19

Bibliographical note

Publisher Copyright:
© 2021 IEEE.

All Science Journal Classification (ASJC) codes

  • Hardware and Architecture

Fingerprint

Dive into the research topics of 'SPACE: Locality-aware processing in heterogeneous memory for personalized recommendations'. Together they form a unique fingerprint.

Cite this