Abstract
Training a deep neural network(DNN) is expensive, requiring a large amount of computation time. While the training overhead is high, not all computation in DNN training is equal. Some parameters converge faster and thus their gradient computation may contribute little to the parameter update; in nearstationary points a subset of parameters may change very little. In this paper we exploit the parameter convergence to optimize gradient computation in DNN training. We design a light-weight monitoring technique to track the parameter convergence; we prune the gradient computation stochastically for a group of semantically related parameters, exploiting their convergence correlations. These techniques are efficiently implemented in existing GPU kernels. In our evaluation the optimization techniques substantially and robustly improve the training throughput for four DNN models on three public datasets.
Original language | English |
---|---|
Title of host publication | 2020 57th ACM/IEEE Design Automation Conference, DAC 2020 |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
ISBN (Electronic) | 9781450367257 |
DOIs | |
Publication status | Published - 2020 Jul |
Event | 57th ACM/IEEE Design Automation Conference, DAC 2020 - Virtual, San Francisco, United States Duration: 2020 Jul 20 → 2020 Jul 24 |
Publication series
Name | Proceedings - Design Automation Conference |
---|---|
Volume | 2020-July |
ISSN (Print) | 0738-100X |
Conference
Conference | 57th ACM/IEEE Design Automation Conference, DAC 2020 |
---|---|
Country/Territory | United States |
City | Virtual, San Francisco |
Period | 20/7/20 → 20/7/24 |
Bibliographical note
Publisher Copyright:© 2020 IEEE.
All Science Journal Classification (ASJC) codes
- Computer Science Applications
- Control and Systems Engineering
- Electrical and Electronic Engineering
- Modelling and Simulation