Block Group Scheduling: A General Precision-scalable NPU Scheduling Technique with Capacity-aware Memory Allocation

Seokho Lee, Younghyun Lee, Hyejun Kim, Taehoon Kim, Yongjun Park

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

Precision-scalable neural processing units (PSNPUs) efficiently provide native support for quantized neural networks. However, with the recent advancements of deep neural networks, PSNPUs are affected by a severe memory bottleneck owing to the need to perform an extreme number of simple computations simultaneously. In this study, we first analyze whether the memory bottleneck issue can be solved using conventional neural processing unit scheduling techniques. Subsequently, we introduce new capacity-aware memory allocation and block-level scheduling techniques to minimize the memory bottleneck. Compared with the baseline, the new method achieves up to 2.26× performance improvements by substantially relieving the memory pressure of low-precision computations without hardware overhead.

Original languageEnglish
Title of host publication2023 Design, Automation and Test in Europe Conference and Exhibition, DATE 2023 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9783981926378
DOIs
Publication statusPublished - 2023
Event2023 Design, Automation and Test in Europe Conference and Exhibition, DATE 2023 - Antwerp, Belgium
Duration: 2023 Apr 172023 Apr 19

Publication series

NameProceedings -Design, Automation and Test in Europe, DATE
Volume2023-April
ISSN (Print)1530-1591

Conference

Conference2023 Design, Automation and Test in Europe Conference and Exhibition, DATE 2023
Country/TerritoryBelgium
CityAntwerp
Period23/4/1723/4/19

Bibliographical note

Publisher Copyright:
© 2023 EDAA.

All Science Journal Classification (ASJC) codes

  • General Engineering

Fingerprint

Dive into the research topics of 'Block Group Scheduling: A General Precision-scalable NPU Scheduling Technique with Capacity-aware Memory Allocation'. Together they form a unique fingerprint.

Cite this