TY - GEN
T1 - CGRA express
T2 - Embedded Systems Week 2009, ESWEEK 2009 - 2009 International Conference on Compilers, Architecture, and Synthesis for Embedded Systems, CASES'09
AU - Park, Yongjun
AU - Park, Hyunchul
AU - Mahlke, Scott
PY - 2009
Y1 - 2009
N2 - Coarse-grained reconfigurable architectures (CGRAs) present an appealing hardware platform by providing programmability with the potential for high computation throughput, scalability, low cost, and energy efficiency. CGRAs have been effectively used for innermost loops that contain an abundant of instruction-level parallelism. Conversely, non-loop and outer-loop code are latency constrained and do not offer significant amounts of instruction-level parallelism. In these situations, CGRAs are ineffective as the majority of the resources remain idle. In this paper, dynamic operation fusion is introduced to enable CGRAs to effectively accelerate latency-constrained code regions. Dynamic operation fusion is enabled through the combination of a small bypass network added between function units in a conventional CGRA and a sub-cycle modulo scheduler to automatically identify opportunities for fusion. Results show that dynamic operation fusion reduced total application run-time by up to 17% on a 4x4 CGRA.
AB - Coarse-grained reconfigurable architectures (CGRAs) present an appealing hardware platform by providing programmability with the potential for high computation throughput, scalability, low cost, and energy efficiency. CGRAs have been effectively used for innermost loops that contain an abundant of instruction-level parallelism. Conversely, non-loop and outer-loop code are latency constrained and do not offer significant amounts of instruction-level parallelism. In these situations, CGRAs are ineffective as the majority of the resources remain idle. In this paper, dynamic operation fusion is introduced to enable CGRAs to effectively accelerate latency-constrained code regions. Dynamic operation fusion is enabled through the combination of a small bypass network added between function units in a conventional CGRA and a sub-cycle modulo scheduler to automatically identify opportunities for fusion. Results show that dynamic operation fusion reduced total application run-time by up to 17% on a 4x4 CGRA.
UR - http://www.scopus.com/inward/record.url?scp=72049107514&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=72049107514&partnerID=8YFLogxK
U2 - 10.1145/1629395.1629433
DO - 10.1145/1629395.1629433
M3 - Conference contribution
AN - SCOPUS:72049107514
SN - 9781605586267
T3 - Embedded Systems Week 2009 - 2009 International Conference on Compilers, Architecture, and Synthesis for Embedded Systems, CASES'09
SP - 271
EP - 280
BT - Embedded Systems Week 2009 - 2009 International Conference on Compilers, Architecture, and Synthesis for Embedded Systems, CASES'09
Y2 - 11 October 2009 through 16 October 2009
ER -