Abstract
Personalized recommendation systems have become a major AI application in modern data centers. The main challenges in processing personalized recommendation inferences are the large memory footprint and high bandwidth requirement of embedding layers. To overcome the capacity limit and bandwidth congestion of on-chip memory, near memory processing (NMP) can be a promising solution. Recent work on accelerating personalized recommendations proposes a DIMMbased NMP design to solve the bandwidth problem and increases memory capacity. The performance of NMP is determined by the internal bandwidth and the prior DIMM-based approach utilizes more DIMMs to achieve higher operation throughput. However, extending the number of DIMMs could eventually lead to significant power consumption due to inefficient scaling. We propose SPACE, a novel heterogeneous memory architecture, which is efficient in terms of performance and energy. SPACE exploits a compute-capable 3D-stacked DRAM with DIMMs for personalized recommendations. Prior to designing the proposed system, we give a quantitative analysis of the user/item interactions and define the two localities: gather locality and reduction locality. In gather operations, we find only a small proportion of items are highly-accessed by users, and we call this gather locality. Also, we define reduction locality as the reusability of the gathered items in reduction operations. Based on the gather locality, SPACE allocates highly-accessed embedding items to the 3D-stacked DRAM to achieve the maximum bandwidth. Subsequently, by exploiting reduction locality, we utilize the remaining space of the 3D-stacked DRAM to store and reuse repeated partial sums, thereby minimizing the required number of element-wise reduction operations. As a result, the evaluation shows that SPACE achieves 3.2× performance improvement and 56% energy saving over the previous DIMM-based NMPs leveraging 3D-stacked DRAM with a 1/8 size of DIMMs. Also, compared to the state-of-the-art DRAM cache designs with the same NMP configuration, SPACE achieves an average 32.7% of performance improvement.
Original language | English |
---|---|
Title of host publication | Proceedings - 2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture, ISCA 2021 |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
Pages | 679-691 |
Number of pages | 13 |
ISBN (Electronic) | 9781665433334 |
DOIs | |
Publication status | Published - 2021 Jun |
Event | 48th ACM/IEEE Annual International Symposium on Computer Architecture, ISCA 2021 - Virtual, Online, Spain Duration: 2021 Jun 14 → 2021 Jun 19 |
Publication series
Name | Proceedings - International Symposium on Computer Architecture |
---|---|
Volume | 2021-June |
ISSN (Print) | 1063-6897 |
Conference
Conference | 48th ACM/IEEE Annual International Symposium on Computer Architecture, ISCA 2021 |
---|---|
Country/Territory | Spain |
City | Virtual, Online |
Period | 21/6/14 → 21/6/19 |
Bibliographical note
Publisher Copyright:© 2021 IEEE.
All Science Journal Classification (ASJC) codes
- Hardware and Architecture