TY - GEN
T1 - Libra
T2 - 2012 IEEE/ACM 45th International Symposium on Microarchitecture, MICRO 2012
AU - Park, Yongjun
AU - Park, Jason Jong Kyu
AU - Park, Hyunchul
AU - Mahlke, Scott
PY - 2012
Y1 - 2012
N2 - Mobile computing as exemplified by the smart phone has become an integral part of our daily lives. The next generation of these devices will be driven by providing an even richer user experience and compelling capabilities: higher definition multimedia, 3D graphics, augmented reality, games, and voice interfaces. To address these goals, the core computing capabilities of the smart phone must be scaled. However, the energy budgets are increasing at a much lower rate, requiring fundamental improvements in computing efficiency. SIMD accelerators offer the combination of high performance and low energy consumption through low control and interconnect overhead. However, SIMD accelerators are not a panacea. Many applications lack sufficient vector parallelism to effectively utilize a large number of SIMD lanes. Further, the use of symmetric hardware lanes leads to low utilization and high static power dissipation as SIMD width is scaled. To address these inefficiencies, this paper focuses on breaking two traditional rules of SIMD processing: homogeneity and static configuration. The Libra accelerator increases SIMD utility by blurring the divide between vector and instruction parallelism to support efficient execution of a wider range of loops, and it increases hardware utilization through the use of heterogeneous hardware across the SIMD lanes. Experimental results show that the 32-lane Libra outperforms traditional SIMD accelerators by an average of 1.58x performance improvement due to higher loop coverage with 29% less energy consumption through heterogeneous hardware.
AB - Mobile computing as exemplified by the smart phone has become an integral part of our daily lives. The next generation of these devices will be driven by providing an even richer user experience and compelling capabilities: higher definition multimedia, 3D graphics, augmented reality, games, and voice interfaces. To address these goals, the core computing capabilities of the smart phone must be scaled. However, the energy budgets are increasing at a much lower rate, requiring fundamental improvements in computing efficiency. SIMD accelerators offer the combination of high performance and low energy consumption through low control and interconnect overhead. However, SIMD accelerators are not a panacea. Many applications lack sufficient vector parallelism to effectively utilize a large number of SIMD lanes. Further, the use of symmetric hardware lanes leads to low utilization and high static power dissipation as SIMD width is scaled. To address these inefficiencies, this paper focuses on breaking two traditional rules of SIMD processing: homogeneity and static configuration. The Libra accelerator increases SIMD utility by blurring the divide between vector and instruction parallelism to support efficient execution of a wider range of loops, and it increases hardware utilization through the use of heterogeneous hardware across the SIMD lanes. Experimental results show that the 32-lane Libra outperforms traditional SIMD accelerators by an average of 1.58x performance improvement due to higher loop coverage with 29% less energy consumption through heterogeneous hardware.
UR - http://www.scopus.com/inward/record.url?scp=84876586321&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84876586321&partnerID=8YFLogxK
U2 - 10.1109/MICRO.2012.17
DO - 10.1109/MICRO.2012.17
M3 - Conference contribution
AN - SCOPUS:84876586321
SN - 9780769549248
T3 - Proceedings - 2012 IEEE/ACM 45th International Symposium on Microarchitecture, MICRO 2012
SP - 84
EP - 95
BT - Proceedings - 2012 IEEE/ACM 45th International Symposium on Microarchitecture, MICRO 2012
Y2 - 1 December 2012 through 5 December 2012
ER -