TY - GEN
T1 - Profile-Guided deployment of stream programs on multicores
AU - Farhad, S. M.
AU - Ko, Yousun
AU - Burgstaller, Bernd
AU - Scholz, Bernhard
PY - 2012
Y1 - 2012
N2 - Because multicore architectures have become the industry standard, programming abstractions for concurrent programming are of key importance. Stream programming languages facilitate application domains characterized by regular sequences of data, such as multimedia, graphics, signal processing and networking. With stream programs, computations are expressed through independent actors that interact through FIFO data channels. A major challenge with stream programs is to load-balance actors among available processing cores. The workload of a stream program is determined by actor execution times and the communication overhead induced by data channels. Estimating communication costs on cache-coherent shared-memory multiprocessors is difficult, because data movements are abstracted away by the cache coherence protocol. Standard execution time profiling techniques cannot separate actor execution times from communication costs, because communication costs manifest in terms of execution time overhead. In this work we present a unified Integer Linear Programming (ILP) formulation that balances the workload of stream programs on cache-coherent multicore architectures. For estimating the communication costs of data channels, we devise a novel profiling scheme that minimizes the number of profiling steps. We conduct experiments across a range of StreamIt benchmarks and show that our method achieves a speedup of up to 4.02x on 6 processors. The number of profiling steps is on average only 17% of an exhaustive profiling run over all data channels of a stream program.
AB - Because multicore architectures have become the industry standard, programming abstractions for concurrent programming are of key importance. Stream programming languages facilitate application domains characterized by regular sequences of data, such as multimedia, graphics, signal processing and networking. With stream programs, computations are expressed through independent actors that interact through FIFO data channels. A major challenge with stream programs is to load-balance actors among available processing cores. The workload of a stream program is determined by actor execution times and the communication overhead induced by data channels. Estimating communication costs on cache-coherent shared-memory multiprocessors is difficult, because data movements are abstracted away by the cache coherence protocol. Standard execution time profiling techniques cannot separate actor execution times from communication costs, because communication costs manifest in terms of execution time overhead. In this work we present a unified Integer Linear Programming (ILP) formulation that balances the workload of stream programs on cache-coherent multicore architectures. For estimating the communication costs of data channels, we devise a novel profiling scheme that minimizes the number of profiling steps. We conduct experiments across a range of StreamIt benchmarks and show that our method achieves a speedup of up to 4.02x on 6 processors. The number of profiling steps is on average only 17% of an exhaustive profiling run over all data channels of a stream program.
UR - http://www.scopus.com/inward/record.url?scp=84864134748&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84864134748&partnerID=8YFLogxK
U2 - 10.1145/2248418.2248430
DO - 10.1145/2248418.2248430
M3 - Conference contribution
AN - SCOPUS:84864134748
SN - 9781450312127
T3 - Proceedings of the ACM SIGPLAN Conference on Languages, Compilers, and Tools for Embedded Systems (LCTES)
SP - 79
EP - 88
BT - Proceedings of the 13th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, Tools and Theory for Embedded Systems, LCTES 2012
T2 - 13th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, Tools and Theory for Embedded Systems, LCTES 2012
Y2 - 12 June 2012 through 13 June 2012
ER -