TMA: Tera-MACs/W neural hardware inference accelerator with a multiplier-less massive parallel processor

Hyunbin Park, Dohyun Kim, Shiho Kim

Research output: Contribution to journalArticlepeer-review

4 Citations (Scopus)

Abstract

Computationally intensive inference tasks of deep neural networks have brought about a revolution in accelerator architecture, aiming to reduce power consumption as well as latency. The key figure-of-merit in hardware inference accelerators is the number of multiply-and-accumulation operations per watt (MACs/W); the state-of- the-art MACs/W, so far, has been several hundreds Giga-MACs/W. We propose a Tera- MACS/W neural hardware inference accelerator (TMA) with 8-bit activations and scalable integer weights less than 1-byte. The architecture's main feature is a configurable neural processing element for matrix-vector operations. The proposed neural processing element uses a multiplier-less massive parallel processor that works without multipliers, which makes it attractive for energy efficient high-performance neural network applications. We benchmark our system's latency, power, and performance using Alexnet trained on ImageNet. Finally, we compared our accelerator's throughput and power consumption to that of the prior works. The proposed accelerator outperforms the state-of-the-art counterparts, in terms of the energy and area efficiency, achieving 2.3 TMACs/[email protected] V on a 28-nm Virtex-7 FPGA chip.

Original languageEnglish
Pages (from-to)1399-1409
Number of pages11
JournalInternational Journal of Circuit Theory and Applications
Volume49
Issue number5
DOIs
Publication statusPublished - 2021 May

Bibliographical note

Publisher Copyright:
© 2021 John Wiley & Sons, Ltd.

All Science Journal Classification (ASJC) codes

  • Electronic, Optical and Magnetic Materials
  • Computer Science Applications
  • Electrical and Electronic Engineering
  • Applied Mathematics

Fingerprint

Dive into the research topics of 'TMA: Tera-MACs/W neural hardware inference accelerator with a multiplier-less massive parallel processor'. Together they form a unique fingerprint.

Cite this