Abstract
Precision-scalable neural processing units (PSNPUs) efficiently provide native support for quantized neural networks. However, with the recent advancements of deep neural networks, PSNPUs are affected by a severe memory bottleneck owing to the need to perform an extreme number of simple computations simultaneously. In this study, we first analyze whether the memory bottleneck issue can be solved using conventional neural processing unit scheduling techniques. Subsequently, we introduce new capacity-aware memory allocation and block-level scheduling techniques to minimize the memory bottleneck. Compared with the baseline, the new method achieves up to 2.26× performance improvements by substantially relieving the memory pressure of low-precision computations without hardware overhead.
Original language | English |
---|---|
Title of host publication | 2023 Design, Automation and Test in Europe Conference and Exhibition, DATE 2023 - Proceedings |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
ISBN (Electronic) | 9783981926378 |
DOIs | |
Publication status | Published - 2023 |
Event | 2023 Design, Automation and Test in Europe Conference and Exhibition, DATE 2023 - Antwerp, Belgium Duration: 2023 Apr 17 → 2023 Apr 19 |
Publication series
Name | Proceedings -Design, Automation and Test in Europe, DATE |
---|---|
Volume | 2023-April |
ISSN (Print) | 1530-1591 |
Conference
Conference | 2023 Design, Automation and Test in Europe Conference and Exhibition, DATE 2023 |
---|---|
Country/Territory | Belgium |
City | Antwerp |
Period | 23/4/17 → 23/4/19 |
Bibliographical note
Publisher Copyright:© 2023 EDAA.
All Science Journal Classification (ASJC) codes
- General Engineering