Discovering Efficient Fused Layer Configurations for Executing Multi-Workloads on Multi-Core NPUs

Younghyun Lee, Hyejun Kim, Yongseung Yu, Myeongjin Cho, Jiwon Seo, Yongjun Park

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

As the AI industry grows rapidly, Neural Processing Units (NPUs) have been developed to deliver AI services more efficiently. One of the most important challenges for NPUs is task scheduling to minimize off-chip memory accesses, which may occur significant performance overhead. To reduce memory accesses, multiple convolution layers can be fused into a fused layer group, which offers numerous optimization opportunities. However, in most Convolutional Neural Networks (CNNs), when multiple layers are fused, the on-chip memory utilization of the fused layers gradually decreases, resulting in non-flat memory usage. In this paper, we propose a scheduling search algorithm to optimize the fusion of multiple convolution layers while reducing the peak on-chip memory usage. The proposed algorithm aims to find a schedule that simultaneously optimizes execution time and peak on-chip memory usage, despite a slight increase in off-chip memory accesses. It organizes the search space into a graph of possible partial schedules and then finds the optimal path. As a result of the improved on-chip memory usage, multiple workloads can be executed on multi-core NPUs with increased throughput. Experimental results show that the fusion schedule explored by the proposed method reduced on-chip memory usage by 39%, while increasing latency by 13%. When the freed on-chip memory was allocated to other workloads and the two workloads were executed concurrently in a multi-core NPU, a 32% performance improvement could be achieved.

Original languageEnglish
Title of host publication2024 Design, Automation and Test in Europe Conference and Exhibition, DATE 2024 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9798350348590
Publication statusPublished - 2024
Event2024 Design, Automation and Test in Europe Conference and Exhibition, DATE 2024 - Valencia, Spain
Duration: 2024 Mar 252024 Mar 27

Publication series

NameProceedings -Design, Automation and Test in Europe, DATE
ISSN (Print)1530-1591

Conference

Conference2024 Design, Automation and Test in Europe Conference and Exhibition, DATE 2024
Country/TerritorySpain
CityValencia
Period24/3/2524/3/27

Bibliographical note

Publisher Copyright:
© 2024 EDAA.

All Science Journal Classification (ASJC) codes

  • General Engineering

Fingerprint

Dive into the research topics of 'Discovering Efficient Fused Layer Configurations for Executing Multi-Workloads on Multi-Core NPUs'. Together they form a unique fingerprint.

Cite this