TY - GEN
T1 - Efficient execution of augmented reality applications on mobile programmable accelerators
AU - Park, Jason Jong Kyu
AU - Park, Yongjun
AU - Mahlke, Scott
PY - 2013
Y1 - 2013
N2 - Mobile devices are ubiquitous in daily lives. From smartphones to tablets, customers are constantly demanding richer user experiences through more visual and interactive interface with prolonged battery life. To meet the demands, accelerators are commonly adopted in system-on-chip (SoC) for various applications. Coarse-grained reconfigurable architecture (CGRA) is a promising solution, which accelerates hot loops with software pipelining. Although CGRAs have shown that they can support multimedia applications efficiently, more interactive applications such as augmented reality put much more pressure on performance and energy requirements. In this paper, we extend heterogeneous CGRA to provide SIMD capabilities, which improves performance and energy efficiency significantly for augmented reality applications. We show that if we can exploit data level parallelism (DLP), it is more beneficial to run on SIMD natively than to transform it into instruction level parallelism (ILP) and run on CGRA. To utilize this property, multiple processing elements in CGRA are grouped to form homogeneous SIMD cores. To reduce the hardware overhead of fetching and replicating configuration in SIMD mode, we propose a ring network and a recycle buffer to pass the configuration around as well as to temporarily store it, which has minimized impact on throughput. Also, we modify memory access units and memory banks to support split memory transactions with forwarding for handling SIMD data access. To adapt to the proposed extension, we introduce a compile technique for SIMD mode code generation to maximize the resource utilization of each SIMD core. Experimental results show that it is possible to achieve an average of 17.6% performance improvement while saving 16.9% energy over heterogeneous CGRA.
AB - Mobile devices are ubiquitous in daily lives. From smartphones to tablets, customers are constantly demanding richer user experiences through more visual and interactive interface with prolonged battery life. To meet the demands, accelerators are commonly adopted in system-on-chip (SoC) for various applications. Coarse-grained reconfigurable architecture (CGRA) is a promising solution, which accelerates hot loops with software pipelining. Although CGRAs have shown that they can support multimedia applications efficiently, more interactive applications such as augmented reality put much more pressure on performance and energy requirements. In this paper, we extend heterogeneous CGRA to provide SIMD capabilities, which improves performance and energy efficiency significantly for augmented reality applications. We show that if we can exploit data level parallelism (DLP), it is more beneficial to run on SIMD natively than to transform it into instruction level parallelism (ILP) and run on CGRA. To utilize this property, multiple processing elements in CGRA are grouped to form homogeneous SIMD cores. To reduce the hardware overhead of fetching and replicating configuration in SIMD mode, we propose a ring network and a recycle buffer to pass the configuration around as well as to temporarily store it, which has minimized impact on throughput. Also, we modify memory access units and memory banks to support split memory transactions with forwarding for handling SIMD data access. To adapt to the proposed extension, we introduce a compile technique for SIMD mode code generation to maximize the resource utilization of each SIMD core. Experimental results show that it is possible to achieve an average of 17.6% performance improvement while saving 16.9% energy over heterogeneous CGRA.
UR - http://www.scopus.com/inward/record.url?scp=84894158024&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84894158024&partnerID=8YFLogxK
U2 - 10.1109/FPT.2013.6718350
DO - 10.1109/FPT.2013.6718350
M3 - Conference contribution
AN - SCOPUS:84894158024
SN - 9781479921990
T3 - FPT 2013 - Proceedings of the 2013 International Conference on Field Programmable Technology
SP - 176
EP - 183
BT - FPT 2013 - Proceedings of the 2013 International Conference on Field Programmable Technology
T2 - 2013 12th International Conference on Field-Programmable Technology, FPT 2013
Y2 - 9 December 2013 through 11 December 2013
ER -