Abstract
Mixed-precision quantization can reduce the computational requirements of Deep Neural Network (DNN) models with minimal loss of accuracy. As executing mixed-precision DNN models on Neural Processing Units (NPUs) incurs significant under-utilization of computational resources, Precision-Scalable NPUs (PSNPUs) which can process multiple low-precision layers simultaneously have been proposed. However, the under-utilization still remains significant due to the lack of adequate scheduling algorithms to support multiple mixed-precision models on PSNPUs. Therefore, in this paper, we propose a dynamic programming-based scheduling algorithm for the operations of multiple mixed-precision models. Our scheduling algorithm finds the optimal execution plan that exploits the precision-scalable MACs to improve the end-to-end inference latency of mixed-precision models. We evaluate the performance of this algorithm in terms of hardware utilization, inference latency, and schedule search time compared to baseline scheduling algorithms. The experimental results show 1.23× inference latency improvements over the baseline algorithms within the allowed minutes.
Original language | English |
---|---|
Title of host publication | LCTES 2024 - Proceedings of the 25th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems, Co-located with |
Subtitle of host publication | PLDI 2024 |
Editors | Aviral Shrivastava, Yulei Sui |
Publisher | Association for Computing Machinery |
Pages | 72-82 |
Number of pages | 11 |
ISBN (Electronic) | 9798400706165 |
DOIs | |
Publication status | Published - 2024 Jun 20 |
Event | 25th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems, LCTES 2024 - Copenhagen, Denmark Duration: 2024 Jun 24 → … |
Publication series
Name | Proceedings of the ACM SIGPLAN Conference on Languages, Compilers, and Tools for Embedded Systems (LCTES) |
---|
Conference
Conference | 25th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems, LCTES 2024 |
---|---|
Country/Territory | Denmark |
City | Copenhagen |
Period | 24/6/24 → … |
Bibliographical note
Publisher Copyright:© 2024 Copyright held by the owner/author(s).
All Science Journal Classification (ASJC) codes
- Software