Orchestrating Multiple Mixed Precision Models on a Shared Precision-Scalable NPU

Kiung Jung, Seok Namkoong, Hongjun Um, Hyejun Kim, Youngsok Kim, Yongjun Park

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Mixed-precision quantization can reduce the computational requirements of Deep Neural Network (DNN) models with minimal loss of accuracy. As executing mixed-precision DNN models on Neural Processing Units (NPUs) incurs significant under-utilization of computational resources, Precision-Scalable NPUs (PSNPUs) which can process multiple low-precision layers simultaneously have been proposed. However, the under-utilization still remains significant due to the lack of adequate scheduling algorithms to support multiple mixed-precision models on PSNPUs. Therefore, in this paper, we propose a dynamic programming-based scheduling algorithm for the operations of multiple mixed-precision models. Our scheduling algorithm finds the optimal execution plan that exploits the precision-scalable MACs to improve the end-to-end inference latency of mixed-precision models. We evaluate the performance of this algorithm in terms of hardware utilization, inference latency, and schedule search time compared to baseline scheduling algorithms. The experimental results show 1.23× inference latency improvements over the baseline algorithms within the allowed minutes.

Original languageEnglish
Title of host publicationLCTES 2024 - Proceedings of the 25th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems, Co-located with
Subtitle of host publicationPLDI 2024
EditorsAviral Shrivastava, Yulei Sui
PublisherAssociation for Computing Machinery
Pages72-82
Number of pages11
ISBN (Electronic)9798400706165
DOIs
Publication statusPublished - 2024 Jun 20
Event25th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems, LCTES 2024 - Copenhagen, Denmark
Duration: 2024 Jun 24 → …

Publication series

NameProceedings of the ACM SIGPLAN Conference on Languages, Compilers, and Tools for Embedded Systems (LCTES)

Conference

Conference25th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems, LCTES 2024
Country/TerritoryDenmark
CityCopenhagen
Period24/6/24 → …

Bibliographical note

Publisher Copyright:
© 2024 Copyright held by the owner/author(s).

All Science Journal Classification (ASJC) codes

  • Software

Fingerprint

Dive into the research topics of 'Orchestrating Multiple Mixed Precision Models on a Shared Precision-Scalable NPU'. Together they form a unique fingerprint.

Cite this